Develop our very own version of apache spark for fun — Part 1

Sumeet More
Oct 22, 2023

--

Disclaimer: This article is based on my current knowledge and understanding. If you have any query or see any improvement, please mention in the comment section

Imagine you have 100 Million records to process , validate and transform. Obviously, processing such no of records will require good hardware if we decide to process those records on single server. However, if we can divide these records into smaller chunks, process them on different normal servers and aggregate their result with tolerance to failures, that will be great. Sounds like apache spark use case ? correct ? However, we can build such system using actor model framework like microsoft orleans. The shown image is glimpse of such system showing cluster of two servers, distributing work among themselves. Stay tuned for more on this.

~ Happy coding

--

--

Sumeet More
Sumeet More

Written by Sumeet More

Software Engineer 2 at Microsoft | Backend Engineer and Architect| Blockchain & ML enthusiast | C#,.NET Core, Rust, Javascript and Go

No responses yet