Design Cache Q: Consistency vs Availability?
This really depends on the requirements. I've met (and designed) cache systems that required complete consistency so their caching layer had to keep that promise. A good example of that is the TimesTen from Oracle. Another good example is the file system cache.
All depends on the purpose of the system. But in most of the cases systems need to be highly available. Availability dominates consistency. Example: online flight / bus booking site where the booking eventually happens on real time price but if the customer sees stale data while searching routes is not a big problem. Eventually the customer will be served with the real time price. So consistency is not of much importance in this scenario. But in systems like Stock market / online flash sale, consistency is extremely important.
I have a question here. Here in the example 30 TB of data is split across 420 machines. So each machine is going to hold separate piece of data. So I thought consistency cannot be achieved anyways here.
Consistency talks about the cache and the DB I think, not the cache and another piece of the cache.
Right.I was surprised at that call.I think Cache needs to be consistent more than it being available.
@vigjadel , since every machine is responsible for a separate piece of data, then no machine needs to communicate of the consistency.
If we get the appropriate data with high latency should be fine over corrupted/stale data. Host may crash with inconsistent data
I didn’t understand the need for comparing consistency and availability here. How would making the system more available compromise consistency ? Could someone help here ?
choosing between consistency and availability depends on what is this cache be used for.
CAP theorem dictates that in case of failures in a distributed system, only two out of the three properties can be satisfied at the most. The compromise has to be made only in failure scenarios not in normal operating conditions.