Design Consistent Hashing
Imagine you are the infrastructure engineer at a hot social media platform. Your system is humming along with 4 cache servers, each holding about 25% of your data. Life is good.
Then your platform goes viral overnight. You urgently add a 5th server. You restart everything. And suddenly, your database is on fire — every single cache server is getting a tsunami of cache misses. Users experience 10× slower page loads. The whole site is crawling.
What happened? You added one server and nearly everything broke.
This is the rehashing problem — one of the nastiest scaling traps in distributed systems. Consistent hashing is the elegant algorithm invented at MIT that makes this problem disappear. It powers Amazon DynamoDB, Apache Cassandra, Discord, Akamai CDN, and Google’s Maglev load balancer.
By the end of this article, you’ll understand exactly why the naive approach breaks, how the ring-based solution works, and how virtual nodes make it production-ready. Let’s go.
Part 1 — The Problem: Why Naive Hashing Breaks
The Naive Approach
When you have n cache servers and want to route a key to a server, the obvious approach is:
serverIndex = hash(key) % N
Where N is the total number of servers. Let’s walk through a concrete example with 4 servers and 8 keys:
| Key | hash(key) | hash(key) % 4 | Assigned Server |
|---|---|---|---|
| key0 | 18358617 | 1 | server 1 |
| key1 | 26143584 | 0 | server 0 |
| key2 | 18131146 | 2 | server 2 |
| key3 | 35863496 | 0 | server 0 |
| key4 | 34085809 | 1 | server 1 |
| key5 | 27581703 | 3 | server 3 |
| key6 | 38164978 | 2 | server 2 |
| key7 | 22530351 | 3 | server 3 |
This works beautifully when N is fixed. Each server gets about 2 keys each. Balanced, clean, simple.
graph TD
subgraph CLUSTER["4-Server Cache Cluster (N=4)"]
S0["🖥️ Server 0\nkey1, key3"]
S1["🖥️ Server 1\nkey0, key4"]
S2["🖥️ Server 2\nkey2, key6"]
S3["🖥️ Server 3\nkey5, key7"]
end
CLIENT["👤 Client\nhash(key) % 4"] --> S0 & S1 & S2 & S3
The Catastrophe: What Happens When N Changes
Now server 1 goes down. N changes from 4 to 3. We apply hash(key) % 3:
| Key | hash(key) | hash(key) % 3 | NEW Server | OLD Server | Moved? |
|---|---|---|---|---|---|
| key0 | 18358617 | 0 | server 0 | server 1 | ✅ YES |
| key1 | 26143584 | 0 | server 0 | server 0 | ❌ no |
| key2 | 18131146 | 1 | server 1 | server 2 | ✅ YES |
| key3 | 35863496 | 2 | server 2 | server 0 | ✅ YES |
| key4 | 34085809 | 1 | server 1 | server 1 | ✅ YES |
| key5 | 27581703 | 0 | server 0 | server 3 | ✅ YES |
| key6 | 38164978 | 1 | server 1 | server 2 | ✅ YES |
| key7 | 22530351 | 0 | server 0 | server 3 | ✅ YES |
7 out of 8 keys moved to a different server. Only key1 stayed put.
flowchart LR
subgraph BEFORE["Before: N=4"]
direction TB
B0["Server 0: key1, key3"]
B1["Server 1: key0, key4"]
B2["Server 2: key2, key6"]
B3["Server 3: key5, key7"]
end
subgraph AFTER["After: N=3 — server 1 removed"]
direction TB
A0["Server 0: key0, key1, key5, key7"]
A1["Server 1: key2, key4, key6"]
A2["Server 2: key3"]
A3["❌ Server 1: gone"]
end
subgraph IMPACT["Impact"]
NOTE["😱 7 of 8 keys remapped to new servers\nEvery remapped key = cache miss\nEvery cache miss = database query\nDatabase overloaded instantly!"]
end
BEFORE -->|"N changes from 4 to 3"| AFTER
AFTER --> IMPACT
This is devastating in production. Every remapped key is a cache miss. Every cache miss is a database query. With millions of keys remapping simultaneously, your database gets a thundering herd of requests it was never designed to handle. The very moment you scale — the moment your system is under stress — traditional hashing makes everything worse.
This is the problem consistent hashing solves.
Part 2 — Consistent Hashing: The Core Idea
From Wikipedia:
“Consistent hashing is a special kind of hashing such that when a hash table is re-sized and consistent hashing is used, only k/n keys need to be remapped on average, where k is the number of keys, and n is the number of slots.”
In plain English: instead of remapping nearly all keys when servers change, consistent hashing remaps only the minimum necessary fraction.
With 4 servers and 100 keys:
- Traditional hashing: ~75 keys remapped when 1 server removed
- Consistent hashing: ~25 keys remapped (only the ones that were on that server)
The ratio k/n means: with k keys and n servers, only k/n keys move — exactly the keys that were on the removed server. Everything else stays put.
Part 3 — The Hash Ring: How It Works
Step 1: Create the Hash Space
Using SHA-1 as our hash function, the output range goes from 0 to 2^160 - 1. That’s a number with 48 digits. We represent this as a linear space:
x0 ←──────────────────────────────────────────────────────────→ xn
0 2^160-1
Step 2: Bend It Into a Ring
Now the magic: we bend this linear space into a circle by connecting the two ends. x0 and xn meet. We call this the hash ring (also called the hash circle or token ring).
graph LR
A["Position 0\n(x0)"] -->|"Clockwise"| B["Position 2^40"]
B --> C["Position 2^80"]
C --> D["Position 2^120"]
D --> E["Position 2^160-1\n(xn)"]
E -->|"Wraps back\n(ring closes here)"| A
style A fill:#6366f1,color:#fff
style E fill:#6366f1,color:#fff
Think of it like a clock face — but instead of 0–12, the numbers go from 0 to 2^160. After 2^160-1, you wrap back to 0.
Step 3: Place Servers on the Ring
We apply the same hash function to each server’s IP address or name. Each server lands somewhere on the ring.
graph LR
subgraph RING["Hash Ring — 4 Servers Placed"]
S0["🟣 Server 0\n(hashed IP)"]
S1["🔵 Server 1\n(hashed IP)"]
S2["🟠 Server 2\n(hashed IP)"]
S3["🟢 Server 3\n(hashed IP)"]
S0 -->|"clockwise →"| S1
S1 -->|"clockwise →"| S2
S2 -->|"clockwise →"| S3
S3 -->|"clockwise →\n(wraps around)"| S0
end
Step 4: Place Keys on the Ring Too
We hash each data key with the exact same hash function. The key lands at some position on the ring.
graph LR
subgraph RING["Hash Ring — Servers + Keys"]
direction LR
K0["⭕ key0\n(top)"]
S0["🟣 Server 0"]
K1["⭕ key1"]
S1["🔵 Server 1"]
K2["⭕ key2\n(bottom)"]
S2["🟠 Server 2"]
K3["⭕ key3\n(left)"]
S3["🟢 Server 3"]
K0 --> S0
S0 --> K1
K1 --> S1
S1 --> K2
K2 --> S2
S2 --> K3
K3 --> S3
S3 -->|wraps| K0
end
Step 5: The Lookup Rule — Go Clockwise
To find which server a key belongs to: start at the key’s position on the ring, walk clockwise, and stop at the first server you encounter.
sequenceDiagram
participant Client
participant Ring as "Hash Ring"
participant Result
Note over Ring: Servers at positions: S0(top-right), S1(right), S2(bottom), S3(left)
Note over Ring: Keys at positions: k0(top), k1(right-ish), k2(bottom-ish), k3(left-ish)
Client->>Ring: Where does key0 go?
Ring->>Ring: Hash key0 → lands at top
Ring->>Ring: Walk clockwise from top...
Ring->>Ring: First server hit: Server 0
Ring-->>Result: key0 → Server 0 ✅
Client->>Ring: Where does key1 go?
Ring->>Ring: Hash key1 → lands between S0 and S1
Ring->>Ring: Walk clockwise...
Ring->>Ring: First server hit: Server 1
Ring-->>Result: key1 → Server 1 ✅
Client->>Ring: Where does key2 go?
Ring->>Ring: Hash key2 → lands between S1 and S2
Ring->>Ring: Walk clockwise...
Ring->>Ring: First server hit: Server 2
Ring-->>Result: key2 → Server 2 ✅
Client->>Ring: Where does key3 go?
Ring->>Ring: Hash key3 → lands between S2 and S3
Ring->>Ring: Walk clockwise...
Ring->>Ring: First server hit: Server 3
Ring-->>Result: key3 → Server 3 ✅
Simple rule: clockwise to the nearest server.
Part 4 — Adding and Removing Servers
This is where consistent hashing proves its genius. Watch how few keys move when the cluster changes.
Adding a Server
Let’s say we add Server 4 to the ring. It lands between Server 3 and Server 0.
flowchart TD
subgraph BEFORE["Before: Adding Server 4"]
direction LR
bS3["🟢 Server 3"]
bS0["🟣 Server 0"]
bK0["⭕ key0\n(was going to Server 0)"]
bS3 -->|clockwise| bK0 -->|clockwise| bS0
end
subgraph AFTER["After: Server 4 Added Between S3 and S0"]
direction LR
aS3["🟢 Server 3"]
aS4["🟡 Server 4 ← NEW"]
aS0["🟣 Server 0"]
aK0["⭕ key0\n(now goes to Server 4!)"]
aS3 -->|clockwise| aK0 -->|clockwise| aS4 -->|clockwise| aS0
end
BEFORE -->|"Server 4 joins"| AFTER
RESULT["✅ Only key0 moved!\nkey1, key2, key3 stay on their original servers.\nAll other keys: zero disruption."]
Only key0 is affected — because Server 4 inserted itself between Server 3 and Server 0, intercepting only the keys that were previously in that arc.
Before Server 4: key0 walked clockwise and hit Server 0 first. After Server 4: key0 walks clockwise and hits Server 4 first. Server 4 is now responsible for key0.
The other keys (key1, key2, key3) are not in this arc — they continue routing to their original servers unchanged.
Removing a Server
Now let’s remove Server 1 from the ring.
flowchart LR
subgraph BEFORE["Before: Server 1 exists"]
direction TB
bS0["🟣 Server 0"]
bS1["🔵 Server 1\n(owns key1)"]
bS2["🟠 Server 2"]
bK1["⭕ key1"]
bS0 --> bK1 --> bS1 --> bS2
end
subgraph AFTER["After: Server 1 removed"]
direction TB
aS0["🟣 Server 0"]
aS1["❌ Server 1\n(gone)"]
aS2["🟠 Server 2\n(now owns key1)"]
aK1["⭕ key1\n(moved here)"]
aS0 --> aK1 --> aS2
end
BEFORE -->|"Server 1 crashes"| AFTER
RESULT["✅ Only key1 moved to Server 2!\nkey0, key2, key3 completely unaffected."]
Only key1 is affected. Since Server 1 is gone, key1 now walks clockwise and lands on Server 2 instead. Every other key was never in Server 1’s arc — they’re untouched.
The maths: With n servers and k keys, consistent hashing moves only k/n keys on average when a server joins or leaves. With 4 servers and 100 keys, that’s only 25 keys out of 100. Compare that to traditional hashing which would move ~75 keys.
Part 5 — Two Problems with the Basic Approach
The basic consistent hashing algorithm was introduced by Karger et al. at MIT. It’s brilliant, but it has two real-world problems that need solving before production use.
Problem 1: Unequal Partition Sizes 📐
A partition is the arc of hash space between two adjacent servers. In an ideal world, all partitions are the same size — each server owns the same fraction of the ring.
But in practice, because server positions are determined by hashing their IP addresses, they can land anywhere. If servers cluster together on one side of the ring, the resulting partitions are wildly unequal.
graph LR
subgraph UNEQUAL["After Server 1 is removed — Unequal Partitions"]
direction LR
S0["🟣 Server 0\nlarge partition"]
S2["🟠 Server 2\nsmall partition"]
S3["🟢 Server 3\nmedium partition"]
WARN["⚠️ Server 0 handles far more traffic\nthan Server 2 or Server 3.\nLoad imbalance!"]
S0 -->|"Large arc — was S0 to S1 to S2"| S2
S2 -->|"Small arc"| S3
S3 -->|"Medium arc — wraps to S0"| S0
S0 --- WARN
end
When Server 1 is removed, Server 2’s partition grows to include Server 1’s old arc. Server 0’s partition is now twice as large as Server 3’s. Server 2’s partition is tiny. Unbalanced load, hot servers, slow responses.
Problem 2: Non-Uniform Key Distribution 🎯
Even without removing servers, if servers happen to hash to positions that are clustered together, most keys will land on one server while others sit nearly empty.
graph LR
subgraph CLUSTERED["Servers Clustered Together — Key Distribution Skewed"]
S0["🟣 Server 0"]
S1["🔵 Server 1"]
S2["🟠 Server 2\nhugest partition — most keys land here"]
S3["🟢 Server 3"]
WARN["⚠️ Server 2 handles 70% of keys\nServer 1 handles only 5%\nMassively unbalanced system!"]
S3 -->|"tiny arc"| S0
S0 -->|"tiny arc"| S1
S1 -->|"tiny arc"| S2
S2 -->|"HUGE arc"| S3
S2 --- WARN
end
The book gives a real, relatable example:
“Imagine data for Katy Perry, Justin Bieber, and Lady Gaga all end up on the same shard. Excessive access to a specific shard could cause server overload.”
If three of the world’s most-followed celebrities happen to hash to the same partition, that one server handles an insane share of traffic while others sit idle. This is called the hotspot problem.
Both of these problems have the same solution: Virtual Nodes.
Part 6 — Virtual Nodes: The Real-World Solution
A virtual node (also called a replica or vnode) is a phantom copy of a server on the ring. Instead of each physical server appearing once, it appears many times at different positions.
How Virtual Nodes Work
Instead of mapping server0 to one position, we create multiple virtual nodes:
server0_0,server0_1,server0_2→ each hashes to a different positionserver1_0,server1_1,server1_2→ each hashes to a different position
graph LR
subgraph VNODE_RING["Hash Ring with Virtual Nodes (3 vnodes each)"]
direction LR
S1_0["🔵 s1_0"]
S0_0["🟣 s0_0"]
S1_1["🔵 s1_1"]
S0_1["🟣 s0_1"]
S1_2["🔵 s1_2"]
S0_2["🟣 s0_2"]
S1_0 -->|cw| S0_0 -->|cw| S1_1 -->|cw| S0_1 -->|cw| S1_2 -->|cw| S0_2 -->|cw| S1_0
end
subgraph PHYSICAL["Physical Servers"]
P0["🟣 Server 0\n(owns all s0_* nodes)"]
P1["🔵 Server 1\n(owns all s1_* nodes)"]
end
S0_0 & S0_1 & S0_2 -.->|"all route to"| P0
S1_0 & S1_1 & S1_2 -.->|"all route to"| P1
The lookup still works the same way: go clockwise from the key, find the first virtual node, then map that virtual node to its physical server.
Why This Fixes Both Problems
graph TD
subgraph BALANCED["✅ Virtual Nodes = Balanced Distribution"]
direction LR
A["Ring with 3 vnodes per server\n\nServer 0 scattered at: 12°, 132°, 252°\nServer 1 scattered at: 60°, 180°, 300°\n\nResult: alternating, evenly spaced"]
B["Each server owns\n~3 non-contiguous arcs\n\nTotal arc per server ≈ equal\nLoad is balanced!"]
A --> B
end
subgraph HOTSPOT["✅ Hotspot Problem Solved"]
direction LR
C["Celebrity users hash to\ndifferent positions"]
D["With many vnodes, they land on\ndifferent physical servers\n\nKaty Perry → Server 2\nJustin Bieber → Server 0\nLady Gaga → Server 3"]
C --> D
end
With more virtual nodes, the load distribution gets mathematically better:
| Virtual Nodes per Server | Standard Deviation of Load |
|---|---|
| 1 (no virtual nodes) | Very high (unpredictable) |
| 100 virtual nodes | ~10% of mean |
| 200 virtual nodes | ~5% of mean |
| 500+ virtual nodes | ~2% of mean |
This comes from research cited in the book. With 100–200 virtual nodes per server, the load distribution is so balanced that the variation is negligible in practice.
Apache Cassandra defaults to 256 virtual nodes per physical node. That’s why Cassandra can add or remove nodes with almost zero impact on cluster performance.
Heterogeneous Servers: Proportional Allocation
Virtual nodes give you a powerful bonus: you can allocate proportionally based on server capacity.
graph TD
subgraph HETERO["Heterogeneous Cluster — Proportional Virtual Nodes"]
direction LR
STRONG["💪 Strong Server\n32 cores / 256GB RAM\n300 virtual nodes\nhandles 3x traffic"]
MEDIUM["🖥️ Medium Server\n16 cores / 128GB RAM\n150 virtual nodes\nhandles 1.5x traffic"]
WEAK["📟 Weak Server\n8 cores / 32GB RAM\n50 virtual nodes\nhandles baseline traffic"]
RULE["✅ No special routing logic.\nMore virtual nodes on ring\n= more keys assigned\n= proportional load automatically."]
STRONG --- MEDIUM --- WEAK --- RULE
end
No custom routing logic. No configuration files. The ring handles it automatically. More virtual nodes = more arcs = more keys = more load. It’s proportional by construction.
Part 7 — Finding Affected Keys When Servers Change
When a server joins or leaves, which specific keys need to be moved? The answer comes from a simple anticlockwise scan rule.
When a Server is Added
To find which keys move to the new server, start at the new server’s position and scan anticlockwise until you hit another server. All keys in that anticlockwise arc belong to the new server.
sequenceDiagram
participant Admin as "👤 Admin"
participant Ring as "Hash Ring"
participant S3 as "Server 3 (previous owner)"
participant S4 as "Server 4 (new)"
Admin->>Ring: Add Server 4 between S3 and S0
Ring->>Ring: Server 4 lands at position P4
Ring->>Ring: Scan anticlockwise from P4...
Ring->>Ring: Hit Server 3 at position P3
Ring-->>Admin: Keys in arc [P3 → P4] must move to Server 4
Note over S3,S4: S3 transfers keys in that arc to S4
S3->>S4: Transfer key0, key_a, key_b (in the arc)
Note over Ring: All other keys untouched ✅
When a Server is Removed
When a server fails, start at the removed server’s position and scan anticlockwise until you hit another server. All keys in that anticlockwise arc must be redistributed to the next server clockwise.
sequenceDiagram
participant Ring as "Hash Ring"
participant S0 as "Server 0"
participant S1 as "Server 1 (failing)"
participant S2 as "Server 2 (inheritor)"
Note over S1: Server 1 goes down!
Ring->>Ring: Scan anticlockwise from S1's position...
Ring->>Ring: Hit Server 0 at position P0
Ring-->>S2: Keys in arc [P0 → P1] must move to Server 2
Note over S1,S2: S1's data must be read from replicas
S2->>S2: Now owns key1 (and others in the arc)
Note over Ring: key0, key2, key3 completely unaffected ✅
This is beautiful in its simplicity: only the adjacent arc is ever affected. The rest of the ring is immune.
Part 8 — Data Replication with Consistent Hashing
In a production system, consistent hashing isn’t just for routing — it’s also used for replication. You don’t store data on just one server; you store it on N servers (where N is the replication factor).
The replication strategy is elegant:
After a key is hashed to a position, walk clockwise and store copies on the next N unique physical servers.
graph LR
subgraph REPLICATION["Replication Factor N=3"]
K0["⭕ key0\n(hashed position)"]
S0["🟤 Server 0"]
S1["🟢 Server 1 ← 1st replica"]
S2["🔵 Server 2 ← 2nd replica"]
S3["🟣 Server 3 ← 3rd replica"]
S4["🟠 Server 4"]
K0 -->|"walk clockwise"| S0 -->|"skip same-physical"| S1 -->|"replica 1"| S2 -->|"replica 2"| S3
STORED["✅ key0 stored on Server 1, Server 2, Server 3\nIf Server 1 goes down, Server 2 and 3 still have the data!"]
S3 --- STORED
end
Important: With virtual nodes, you must skip virtual nodes that belong to the same physical server. If s1_0 and s1_1 are both for Server 1, you skip the second one — otherwise the same physical machine stores two “replicas,” which defeats the purpose.
For maximum resilience, replicas are placed in different data centers — so a power outage, network failure, or natural disaster doesn’t take down all copies of your data simultaneously.
Part 9 — Where Consistent Hashing Lives in Production
Consistent hashing is not an academic exercise. It’s running in systems you use every single day.
mindmap
root(("Consistent\nHashing\nin Production"))
Amazon DynamoDB
Partitions data across nodes
Uses virtual nodes internally
Handles millions of requests/sec
Automatic scaling with zero downtime
Apache Cassandra
256 vnodes per node by default
Data replicated across nodes
Adding a node moves only 1/N of keys
Used by Netflix Apple Discord
Discord
Chat message storage
Scaled to 5 million concurrent users
Consistent hashing for shard routing
Akamai CDN
194+ edge servers worldwide
Routes content requests to nearest cache
Consistent hashing minimises cache misses
Handles 15-30% of all web traffic
Google Maglev
Software network load balancer
Consistent hashing across backends
Connection persistence across restarts
Runs on every Google server
Redis Cluster
16384 fixed hash slots
Similar principle to consistent hashing
Slots assigned to nodes
Resharding moves only affected slots
Deep Dive: How Cassandra Uses It
When a new node joins a Cassandra cluster, here’s what happens:
sequenceDiagram
participant Admin as "👤 Admin"
participant NewNode as "🆕 New Cassandra Node"
participant Ring as "Token Ring"
participant OldNode as "📦 Old Node (neighbour)"
participant Client as "👤 Client"
Admin->>NewNode: Join cluster
NewNode->>Ring: Generate 256 virtual node positions
Ring->>Ring: Insert all 256 tokens into ring
Ring->>OldNode: "These key ranges now belong to NewNode"
OldNode->>NewNode: Stream only the affected data
Note over OldNode,NewNode: Only ~1/N of data moves\nN = total nodes in cluster
NewNode-->>Ring: "Ready to serve traffic"
Client->>NewNode: Can now route requests here ✅
Note over Ring: Zero downtime, zero reconfiguration\nCluster just got bigger seamlessly
This is why Cassandra is famous for “zero downtime scaling.” You literally just plug in a new node and the cluster self-heals.
Part 10 — The Standard Deviation Math (Why More VNodes = Better)
This is worth understanding intuitively, even if you’re not a maths person.
Imagine 3 darts thrown randomly at a circular dartboard. You’ll likely get a lumpy distribution — some sections much larger than others. Now throw 300 darts. The sections become much more uniform. This is exactly the virtual node principle.
xychart-beta
title "Load Imbalance vs Number of Virtual Nodes"
x-axis ["1 vnode", "10 vnodes", "50 vnodes", "100 vnodes", "200 vnodes", "500 vnodes"]
y-axis "Standard Deviation of Load (%)" 0 --> 45
line [42, 28, 16, 10, 5, 2]
The tradeoff: More virtual nodes = more memory to store the ring metadata. Each virtual node entry requires about 100 bytes. With 200 vnodes and 100 servers, that’s 200 × 100 × 100 bytes = 2MB of ring metadata per server. Very manageable.
Production systems typically use 150–300 virtual nodes per server — the sweet spot between distribution quality and memory overhead.
Part 11 — Consistent Hashing vs. Other Approaches
How does consistent hashing compare to other ways of routing keys to servers?
graph TD
subgraph NAIVE["❌ Naive Modular Hashing\nhash(key) % N"]
N1["Pro: Dead simple to implement"]
N2["Con: N changes → most keys remap"]
N3["Con: Requires downtime to rescale"]
N4["Con: Cache miss storm on any resize"]
end
subgraph RANGE["🟡 Range-Based Partitioning\nA-M → Server 1, N-Z → Server 2"]
R1["Pro: Easy to reason about"]
R2["Con: Hotspots if keys aren't uniform"]
R3["Con: Manual rebalancing required"]
R4["Used by: HBase, MongoDB (early)"]
end
subgraph CONSISTENT["✅ Consistent Hashing"]
C1["Pro: Only k/n keys move on resize"]
C2["Pro: Automatic load balancing via vnodes"]
C3["Pro: Proportional capacity via vnode count"]
C4["Pro: Zero-downtime scaling"]
C5["Used by: Dynamo, Cassandra, Discord, Akamai"]
end
NAIVE --> CONSISTENT
RANGE --> CONSISTENT
Part 12 — Benefits: The Full Picture
Let’s bring it all together. Why is consistent hashing so important?
graph LR
CH["⭕ Consistent\nHashing"]
CH --> B1["🔄 Minimal Key Redistribution\n\nOnly k/n keys move when\na server joins or leaves.\nWith 100 servers, adding 1\nmoves only 1% of keys."]
CH --> B2["⚖️ Horizontal Scaling\n\nAdd servers at any time\nwithout downtime or\nmanual data migration.\nThe ring self-balances."]
CH --> B3["🌡️ Hotspot Mitigation\n\nVirtual nodes spread\npopular keys across\nmany physical servers.\nNo more celebrity data\noverloading one shard."]
CH --> B4["📊 Heterogeneous Clusters\n\nPowerful servers get more\nvirtual nodes and serve\nmore traffic naturally.\nNo custom routing code."]
CH --> B5["🔁 Automatic Replication\n\nN-replica replication\nfollows the same ring.\nFailure recovery is\nautomatically bounded."]
style CH fill:#6366f1,color:#fff
style B1 fill:#dbeafe
style B2 fill:#dcfce7
style B3 fill:#fef9c3
style B4 fill:#fce7f3
style B5 fill:#ede9fe
The Algorithm in 6 Steps
Here’s the complete consistent hashing algorithm, distilled:
flowchart TD
STEP1["① Choose a hash function\n(SHA-1, MD5, MurmurHash)\nOutput range = 0 to 2^160-1"]
STEP2["② Bend the range into a ring\nConnect x0 and x_max"]
STEP3["③ Place each server on the ring\nhash(server_ip) → position\nRepeat M times per server (vnodes)"]
STEP4["④ Place each key on the ring\nhash(key) → position\n(no modular operation!)"]
STEP5["⑤ Lookup: go clockwise from key\nFind first virtual node → physical server\nThat server owns the key"]
STEP6["⑥ On change: only adjacent arc affected\nAdding server: anticlockwise scan for keys to claim\nRemoving server: clockwise neighbour inherits keys"]
STEP1 --> STEP2 --> STEP3 --> STEP4 --> STEP5 --> STEP6
Interview Quick Reference
mindmap
root(("Consistent\nHashing\nInterview"))
The Problem
hash key % N breaks on resize
N changes → most keys remap
Cache miss storm
Thundering herd on DB
The Solution
Hash ring 0 to 2^160
Servers placed on ring by IP hash
Keys placed on ring by key hash
Go clockwise → first server wins
Key Properties
Only k/n keys move on resize
Adding server → only adjacent arc affected
Removing server → only adjacent arc affected
Virtual Nodes
Each server has M positions on ring
Solves unequal partition sizes
Solves non-uniform key distribution
More vnodes = better balance
Cassandra uses 256 by default
Replication
Walk clockwise N servers for N replicas
Skip vnodes of same physical server
Place replicas in different data centres
Real World
DynamoDB partitioning
Cassandra token ring
Discord chat sharding
Akamai CDN routing
Google Maglev load balancer
Tradeoffs
More vnodes = more memory for ring metadata
Sweet spot 150-300 vnodes per server
Virtual nodes add implementation complexity
Summary: What We Learned
Traditional hashing (
hash(key) % N) is simple but catastrophic whenNchanges — it remaps ~75% of keys, causing a cache miss storm that overwhelms your databaseConsistent hashing bends the hash space into a ring. Servers and keys are both placed on this ring using the same hash function. A key routes to the first server clockwise from its position
On server add/remove, only the adjacent arc of keys is affected. On average, only
k/nkeys move — the mathematical minimumVirtual nodes (multiple positions per server) solve the two core problems of the basic algorithm: unequal partition sizes and non-uniform key distribution. More vnodes = more even load distribution
Replication uses the same ring: store copies on the next
Nunique physical servers clockwise from the key’s positionThe whole ecosystem runs on this: Amazon DynamoDB, Apache Cassandra, Discord, Akamai, Google Maglev. It’s one of the most impactful algorithms in modern distributed systems
What’s Next?
In Chapter 6, we’ll design a Key-Value Store from scratch — diving into CAP theorem (Consistency, Availability, Partition Tolerance), data replication strategies, consistency models, conflict resolution, and how DynamoDB and Cassandra make these tradeoffs. Consistent hashing will be one of the building blocks!
If this article helped you understand distributed systems better, share it with a fellow engineer. The more people understand these foundations, the better the systems we all build.