I see a problem in this design. Here we are assuming that 10M Request per second will be evenly distributed across all the keys ( shards ). In practical, the read request coming in might not be evenly distributed. It could happen that you shared a bitly URL on social media and this URL is clicked by 1 Million user at a time, hence there will be just 1 server taking all the load. Hence, 5500 QPS per shard is not the right calculation. It will totally depend on the incoming traffic.
this is a very good question. may be we got to have multiple machines in single shard to handle that many requests/sec.
This is a good question. Systems like Dynamo DB and Cassandra use the concept of Virtual Nodes in the consistent hash ring. It is a concept where, one node has some data from a range of shards instead of all data from one shard. That way if one shard is getting very hot, instead of few nodes taking all the heat, the heat gets spread across many nodes in the ring which are hosting this hot shard.
Agree with virtual nodes. You need to do something like linear probing with consistent hashing to ensure server hotspots are not formed.
see this https://medium.com/vimeo-engineering-blog/improving-load-balancing-with-a-new-consistent-hashing-algorithm-9f1bd75709ed