Search Typeahead Q: What if I use a separate trie for updates and copy it over to the active one periodically?
I agree with rashmi's sentiment that the "major problems" can probably be mitigated IMHO. For example, we could isolate the nodes that we want to update/copy (and have a distributed queue for example process such updates), instead of copying the entire trie. I agree with the consistency concerns noted in the answer, but I think consistency is less of a concern than availability per the design goals noted earlier. With redundancy, it doesn't seem to be a huge deal to put a lock on a subset of nodes for a given machine, so long as we keep some N redundant machines free (which would allow us to keep availability high).
Another approach we can apply by rotating the reading node. One node will be reading another node will be updating. When the update has been done, then the updating node will be turned into the reading node and the reading node will be updating node. The entries in the load balancer will be updated accordingly. Therefore, all the read traffic will be going to the updated node. In a way, we do not need to copy the entire tree. I think much Key Value stores use this approach this day.
Each stores list of top 5 queries. Each node can have two lists:
- Updated List
- Updating List
Then 100s of write request can do writing on updating list in parallel. And readers are free to use updated list and this will be smallest lock to take.