Scaling system by setting limits

Sumeet More
2 min readApr 5, 2024

--

Disclaimer: This article is based on my current knowledge and understanding. If you have any query or see any improvement, please mention in the comment section

Photo by Bernd 📷 Dittrich on Unsplash

By seeing the title, you all must be like “Sumit! how can you scale a system by putting limits on it?” I know, I know. But trust me this approach is interesting.

During my college days, I used to think let’s design a system which can scale infinitely, but looking back now I feel that I was so wrong. The problem with infinite scale is that you cannot predict the behavior of your system at that level.

My background is not in computer engineering, I am an electronics engineer so in the hardware engineering world, every system is designed keeping the threshold of the system in the mind. This electronics device/system will have correct behavior if it is operated under desired conditions is mentioned during the launch of any electronic device/system.

I think, this insight from hardware engineering world can help us to design software systems more efficiently. How? Let’s take an example.

Imagine that you are a famous youtuber and you are launching your coffee brand online. On launch day, your online system gets huge traffic and now you have to scale your system without breaking its behavior. There are multiple options to achieve this goal. But let’s see how setting limits on system can help us scale our system and at the same time maintain correct behavior of the system.

Requests per second (RPS) is very common metric used in software engineering projects to understand how APIs scale. But we can use this metric to understand the limit of our system. Using common benchmark techniques, we can easily find how many requests per second our system can serve without breaking the behavior. Once we gather this information, we can scale our systems easily. Suppose single instance of system A has limit of 1000 rps so once the system A has reached its threshold, immediately another instance of system A can be created to serve more requests. This way we can scale out system without breaking its behavior.

For demo: I have used a hello world api and have put a limit of 10 rps on it, the moment it gets more requests than its defined limit, Knative immediately spins up another pod of hello world api and scales up.

Demo:

~ Happy Coding

References:

--

--

Sumeet More
Sumeet More

Written by Sumeet More

Software Engineer 2 at Microsoft | Backend Engineer and Architect| Blockchain & ML enthusiast | C#,.NET Core, Rust, Javascript and Go

Responses (1)