Can we think of a better sharding strategy?


#1

Sharding a Database Q: Can we think of a better sharding strategy?


#2

Hmm, nice explanation. But have some doubt. How does the client knows about these new shard additions/deletions to the ring ? Does cache actually updates the client on any change to the ring ? Also, once client know about these changes, won't there be a period when client requests for some keys will fail ? This is the period when keys rebalance is happening. Hashing function will return a shard according to the new ring structure but the keys are still in-flight and not yet actually present on the shard ?


#3

When Shard S1 goes down, it should be one of the shard only, where in figure we are displaying all the reds for S1 shard?


#4

I don’t understand the multiple “copies” of the shard.

  1. Are there 2 machines per shard now : so that all data of the shards are copied into 2 machines ? Then the costs double, and we need to explain how to copy : master-slave, or some other mechanism.

  2. If there is only one machine per shard, but it only appears twice on the ring : if one S1 goes down, all S1’s will go down. So there is no advantage of making S1’s picture twice on the ring.


#5

There is more explanation in Highly Available Database problem.
I understand it to be a peer-peer model and not master-master model.


#6

The explanation of multiple copies of the same server is incorrect. They are virtual servers mapping to the same physical server. For fault tolerance, a strategy to use would be: Have the data copied to the next 2 servers clockwise, Which results in 3 copies of data.