Will our machines be able to handle qps of 23000?


Design Cache Q: Will our machines be able to handle qps of 23000?


where is the 4 coming from? cores?


Is 23,000 QPS per machine based on the estimate of 10 M QPS / 420 machines ?


Yes 10M/420 is 23,809


How did this 174 us coming from ? CPU time available for 23k queries : 1 second * 4 = 4 seconds CPU time available per query = 4 * 1000 * 1000 / 23000 microseconds = 174us.???


23k/4(cores) in 1sec => time for 1 query=4/23k sec = 4 * 1000000/23k microsec


Maybe this is dumb but where do the "1000000" comes from, is it the speed of cycles per second, or operations per second of one of the cores?


Would like to know a typical rule of thumb comfortable value to consider for cpu time per query for a mixed / balanced read and write workload.


How did we know that 174 us is not enough for one query?


I have the same question as above "How did we know that 174 us is not enough for one query?". I think it's making the assumption that the data nodes are fairly big. If data node > 1MB, then the sequential read from memory for that data would be over 174 us. There will also be some random memory reads which add that number up. Moreover, we are applying locks on the map which again add the latency quite a bit.


(4/23K) sec

=> (4 * 10^6 * 10^-6 sec) / (23K) [To convert numerator from seconds to microseconds]

=> 4 * 10^6 * 1 microsec / (23K) [Since 10^-6 sec is 1 microsec]

=> 4 * 1000000 microsecs / 23K


what is 10M in your given statement?


4 cores need to handle 23k queries in 1 second. If queries are executed in parallel over the 4 cores, from where is 4 seconds coming into picture?


Why 174us is not enough?
I guess we are comparing 174 us with a “kind of standard latency” number(https://gist.github.com/jboner/2841832 latency number that every programmer should know) which is 0.5 ns.


10 M is the total QPS that we assumed previously in estimationn section.


where this 4sec comes from?
You just got confused the way they wrote it.What we are trying to say here is that in 1 sec we are handling 23k queries.Now if we have 4 core than in 1 sec we will handle 23k/4 query OR other way of saying that is in 4sec we handled 23k queries.


Even a potato can run 23k QPS. Typical single threaded key-value stores, such as Redis, handle around 100k QPS. I’d say a typical cache would do something in the range 100k-500k QPS, and way more if the app does multiple operations in a single request. Your bottlenecks will be the OS handling a large amount of TCP or UDP packets and possibly the network bandwidth. Might be good to clarify these things with the interviewee if IRQ handling or network bandwidth bottlenecks are plausible. If the app requests include multiple queries, we can handle more queries. If the requests or responses are large, they could be saturating network bandwidth.


@edgars-irmejs The 23k QPS here is for a single node not for the entire cache layer. The 100k QPS you mentioned about redis would be divided across multiple nodes.


That’s what is mentioned in the end, that is is dependent on many factors such as read-write ratio, size of data to be read. Write requests will involve more latency than read requests if cache eviction is needed. If the “values” are in KBs though, the time to read from main memory should not be that high.


Hi All, I was able to understand it with the following math:
Let’s assume there was only one cpu core which has to process 23000 requests per second.
We know 1 second = 1000 * 1000 uS
So one core cpu has (1000,000 / 23,000) uS ~= 43.5 uS time to process one request
If we have 4 cores, then we have around (43.5 * 4) cpu time ~= 174uS for one request

This does not mean that each request needs 174uS CPU Time. It simply means that each request has max upto 174uS attention time span from the CPU. Off-course, we also need to account for context-switching etc. Please let me know what you all think.