Implement Chunk Server Load Balancing - The Filesystem

Implementation

Over time, chunk distribution becomes uneven: new servers start empty, old servers fill up, and some receive more writes. Load balancing moves chunks to equalize utilization.

Rebalancing algorithm:

Periodically compute the average disk utilization across all servers
If any server exceeds average + 20%, it is overloaded; if below average - 20%, it is underloaded
For each overloaded server, select chunks to migrate to underloaded servers
Migration: copy chunk from source to target, update master metadata, delete from source
Run as a background process with rate limiting to avoid saturating the network

Request:  {"type": "rebalance_check", "msg_id": 1}
Response: {"type": "rebalance_check_ok", "in_reply_to": 1, "average_utilization_pct": 50, "overloaded": [{"server": "cs1", "utilization_pct": 78}], "underloaded": [{"server": "cs4", "utilization_pct": 15}]}

Request:  {"type": "rebalance_execute", "msg_id": 2, "moves": [{"chunk": "ch_010", "from": "cs1", "to": "cs4"}]}
Response: {"type": "rebalance_execute_ok", "in_reply_to": 2, "moved": 1, "bytes_transferred": 67108864}

Sample Test Cases

Check identifies overloaded and underloaded serversTimeout: 5000ms

Input

{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"rebalance_check","msg_id":2}}

Expected Output

{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}

Execute rebalance moves chunksTimeout: 5000ms

Input

{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"rebalance_execute","msg_id":2,"moves":[{"chunk":"ch_010","from":"cs1","to":"cs4"}]}}

Expected Output

{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}

Hints

Hint 1▾

Monitor each server disk usage; if imbalance exceeds 20%, trigger rebalancing

Hint 2▾

Move chunks from overloaded servers to underloaded ones

Hint 3▾

Rebalancing is a background operation — do not disrupt active reads/writes

Hint 4▾

After moving a chunk, update the master metadata and notify affected servers

Hint 5▾

Prefer moving chunks that are infrequently accessed to minimize disruption

Implementation

Sample Test Cases

Hints

Resources

Theoretical Hub

Key Concepts