I did more learning on distributed systems this year, so would like to share some of my learning notes and experiments with you :) .
a. Horizontal Partitioning: Different rows are split into different partitions.
b. Vertical Partitioning: Different columns are split into different partitions (maybe some denormalisation are needed e.g. with userId — but it worths it, as it’s better to query in 1 partitions than doing a join across multiple partitions)
When to shard:
When a single node datastore reaches its bottlenecks:
- Storage: a single node just has that much disk space, e.g. 50GB.
- Computing power: A server needs to handle concurrent queries and those need computing resources to run. This is the I/O limit.
- Network bandwidth: It’s possible that the concurrent read/write exceeds the throughput of the database.
- Geography. It might be efficient to store data of the users in the same region to reduce latency of data access.
- High degree of data isolation and privacy requirement for some data arises.
Before sharding, you could try other options:
- Caching to offload the load on database to memory
- Creating more read replicas. Split read/write load and since most applications are read heavy, it might alleviate the problem.
1. Hash based:
Simple, use a hash function to compute the shard key into a hash value, and then mod it by number of shards, e.g.
firstname.lastname@example.org --hash--> 123249 --locate-shard--> % 20 = 9.
- Even data distribution by randomising the shard key by the hash function.
- Easy to maintain with the hash function vs. a lookup table.
- Rebalancing can be difficult, as ****the total number of nodes are fixed at the beginning, and adding/removing the node means redistributing all the data among the new number of nodes again.
Use case: Memcached, Redis, MongoDB
2. Range based
Here, each shard is assigned a range of values, and when new record comes in, it is put on the shard whose range is the upper/lower bound of the shard key of the record.
- Performant with range queries, e.g. select all users with a email starts with
- Hot key problem, especially when majority queries target adjacent shard keys. A solution is to have hash sharding on top of range sharding.
- Rebalancing can be difficult as the data is dynamically redistributed when the total number of data grows, and it’s difficult to pre-split the table into multiple shards.
- Warming problem as at the start only a single node is taking all the queries because there’s not yet enough splits to happen for the data to be distributed to other idle nodes.
Use case: Google Spanner and HBase
3. Consistent Hashing
This is actually also a hash-based strategy, but it’s very scalable vs. the basic one.
With consistent hashing, data is evenly and randomly distributed across shards on a “hash ring”.
- Easy to rebalance: adding/removing nodes impacts only a small fraction of data. This is especially true with virtual nodes, where shard keys map to the same number of virtual nodes, which in turn map to fewer physical nodes. So the proportional data to be moved with change of nodes is even less, and new physical nodes can be added to even the workload and remapped with the virtual nodes, with the mapping between virtual nodes — hash key stays the same..
- A lookup table needs to be maintained
Use case: DynamoDB and Cassandra
4. Directory based:
Similar to the range based hashing, a lookup table is maintained to keep track of which shard holds which data, but the shard key here has to be have low cardinality. (Low-cardinality refers to columns with few unique values. e.g. boolean value)
- Keep shards balanced is important, but difficult to achieve, especially with hot key problem.
- Rebalancing can be expensive, and doing this without incurring downtime is difficult. Thus, growth should be taken into consideration so that each shard has enough space to hold more records.
- Less joins and more denormalisation. Joins across shards are expensive to operate and n alternative is to denormalise data so queries can be performed on one shard.
- Challenges with referential integrity and consistency: As synchronisation across shards take time (normally async operations), referential integrity like foreign key or uniqueness is challenging without proper consideration. For example. auto- incrementing ID is a bad idea due to potential conflicts between shards, the same is for timestamped ID as the clock time of shards can be different.
That’s it! Happy Reading.