What is the number of machines required to cache?


Design Cache Q: What is the number of machines required to cache?


Assuming a machine of 200GB RAM then it would be around 350 machines


Do know that this is the absolute minimum. Its possible we might need more machines because the QPS per machine is higher than we want it to be.

Can someone explain the above statement. Based on the example discussed, how is the QPS per machine is higher than we want it to be.


How’s the QPS fitting into the last calculation? I don’t understand the relation between the QPS and the amount of RAM. We’re setting off estimating the QPS and end up with a minimum number of machines … something is missing from that


how we calculate the number of machines ?
could anyone help me in this?
I have no idea about this.


Here we are assuming that one machine can handle only 1M QPS. And also we are assuming that there are at most 10M QPS. If that increases, despite having enough machines to store the cache (30TB), we will need
even more machines , just to handle the QPS.


Not clear: We calculated that for 30 TB @ 72GB per machine RAM, we need 420 machines. We know that each machine can handle 1M queries, so 420 machines can handle 420 M QPS. This is way above the requirement of 10M QPS. How is this number minimum? I think its way beyond what we want.
QPS has no connection to the number of machines. We need 420 machines for the Cache storage of 30 TB. Not for the QPS.
Have I misunderstood something?


I think it’s about being careful and aware of details:

  • How well can we distribute the load and size equally? What if our hottest key is requested 3M QPS? Even though 30TB is the total data, we may need to store multiple copies of hot data and load balance on that. How much more space will be needed?
  • If cache misses are horrid, it may make sense to keep a part of data on multiple machines, avoid a miss on node failure. What part of data, how large is it, how many copies? How to find the data when multiple, possibly failed, nodes may have it? Probably request to all nodes, wait for only the first answer. This is how tail latencies get reduced too - may be very important to do that too - don’t know until asked.
  • What does 30TB total data really mean? Are the needs geographically distributed? If so, maybe you only need to store the data needed by currently active users. Say, your app has a peak active daily time at 70% users. If they only need 30*70%=20TB of cache, you’re suddenly saving your company a lot of money just by realizing this.
  • Systems change. Even though at our current estimates QPS seems well satisfied, we should know to intended future of the product. If we could use the cache system in places where QPS is a bottleneck, the design must at least note the possibility.

I’d go for showering the interviewee with these possibilities and questions. Even if it is dangerous to corner yourself in low importance details, not catching some of these system oddities early can cost way loads of money to the business. Duplicating data could require 2x more machines, and more complicated software. In contrast, the choice of RAM model won’t affect the numbers much, so is a waste of time.

Disclaimer: I’m not an authority, just throwing around ideas.


“A cache has to be inherently of low latency. Which means all cache data has to reside in main memory”
The calculation is not bases on the QPS but based on the size of cache and here we need that the whole 30TB of cache should reside in main memory and there is an upper limit on the size of the main memory we can have. In that case we have to use multiple machines and in this case 420 machines.


It’s not only just about the RAM size of the machines. In fact, assume we’re using machines with 4 cores, then with 420 machines means each core will handle a QPS of 10M/420/4=6000. So for each query it takes roughly 1/6000=167us. While normally sequentially read 1MB of data from RAM takes 250us, that means each query can only read around 668KB of data. If the system requires the query to read more data, there’s a problem.


I think many people have pointed out the same but just the fact that the calculation of the number of machines required is not correct.

  1. Besides holding the data, it needs to process the requests and send response as well.
  2. Since the cache is distributed, therefore it will hold only partitioned data, now if it holds a partitioned data that means whatever is missing has to be fetched from a Database, which means additional processing.