How many shards do we have and how do we distribute the data within the shard?


#1

Sharding a Database Q: How many shards do we have and how do we distribute the data within the shard?


#2

There is also a key based lookup sharding technique where the location of which data is present in which shard is done by a lookup query. There is another database that stores the mapping between the key and the shard. The disadvantage with this approach is that it can introduce a single point of failure since the db containing the shard mapping goes down we can't fetch the data.


#3

Also you might need a very big machine to store all the keys in one machine which is not a scalable solution.


#4

Am I understanding this wrong or something? In the 3rd para: "such H%(S+1) changes for every single key causing us to relocate each and every key in our data store". I don't think "each and every key" needs to be relocated.


#5

@ [WehrCashews]
Initially, the location of each key in the map will be H%S. Now when the shards are increased, the location will be calculated using a new shard size (S+1). In this case, the results will be wrong. So we need to remap all the keys using H%(S+1).


#6

@abhishek_jain_301 It’s look like all key need to remap but in reality it’s not needed. Have a look at consistent Hashing, there you will iterate over same problem and get know how many shards are effected when new one is added.


#7

“Expensive” is some word that explains nothing. What is important is that relocation of data will take time and during this time database performance can be decreased. Also, storage requirements during increasing of the number of servers can be two times greater than initial requirements.