Develop our very own version of apache spark for fun — Part 1

Oct 22, 2023

Disclaimer: This article is based on my current knowledge and understanding. If you have any query or see any improvement, please mention in the comment section

Imagine you have 100 Million records to process , validate and transform. Obviously, processing such no of records will require good hardware if we decide to process those records on single server. However, if we can divide these records into smaller chunks, process them on different normal servers and aggregate their result with tolerance to failures, that will be great. Sounds like apache spark use case ? correct ? However, we can build such system using actor model framework like microsoft orleans. The shown image is glimpse of such system showing cluster of two servers, distributing work among themselves. Stay tuned for more on this.

~ Happy coding

Develop our very own version of apache spark for fun — Part 1

Written by Sumeet More

No responses yet