How the QPS is calculated here? I am not getting the exact calculation. Can somebody explain it ?
The way QPS is calculated is as follows : Earlier we mentioned that each machine would have a RAM of 72 GB of RAM. For serving 30TB of cache, the number of machines required would be 30 TB / 72G which is close to 420. Assume that we have 420 machines to server 30 TB of distributed cache. Now regarding the QPS the requirement was 10 M
Now per machine the QPS would be 10M / 420 = Approximately 23000 QPS. So this meant per machine should be able to handle 23,00 QPS. The approach is similar to how we decided on the number of machines based on the per machine RAM and the total cache size. Similarly for the QPS, it is based on the total QPS / number of machines.
Next assuming that a machine has to serve 23,000 QPS then we look at each machine has 4 core and then we calculate the per request time as - CPU time available per query = 4 * 1000 * 1000 / 23000 microseconds = 174us (Note everything is converted to milliseconds.) So the machines have to return the query in 174 us. This is the way the QPS is derived. Then based on the read / write traffic and the latency numbers as per the https://gist.github.com/jboner/2841832, the QPS is further refined by increasing the number of machines.
I am coming up with 23,000 queries / 1 sec.* 1,000,000 us/ 1 sec = 1 query / 40us. So 40us per query service time is required. I don't see why we would calculate this based on CPU time, or why my # is so far off. Only thing I can think of is that if I multiply by 4 CPUs I have each CPU servicing 1 query every 160us (close to the 174 above)- but why would we calculate this per CPU rather than per server?
How do we get this: 4 * 1000 * 1000 as CPU time?
I think they have considered a quad core processor and allowing the processors to process and respond in a second.
reason for considering the time per CPU is to make sure each CPU has ample time to to read or write operation. Since if we have little time to do this and we get so many queries per second, either the server has to reject the queries (causing higher latency) or queue them which itself could cause other issues (queue growing big and causing the server to crash)
Yes they have considered a quad core processor. So, 23000 QPS will be served by 4 core processors (4 different threads). It means each core processor has to process 23000/4 = 5750 QPS. So each query needs to served in 1 sec/ 5750 = 174 microsecond
There are 23000 QPS
which is exactly equal to 23000/(1000 *1000) QP micro sec
CPU Time = 4 * (23000/ (10001000)) micro sec
= 41000*1000/23000 micro sec
= 174 us