ARCHIVED from builddistributedsystem.com on 2026-04-28 — URL: https://builddistributedsystem.com/tracks/indexes/tasks/task-13-5-distributed-index
TASK

Implementation

Distribute your index across multiple nodes:

  1. Partition index entries by key (hash or range)
  2. Point lookups go to single partition
  3. Range queries scatter to all partitions, gather results
  4. Handle partition rebalancing when nodes join/leave

Choose between local secondary index (partitioned with data) and global secondary index (partitioned by index key).

Sample Test Cases

Partitioned insert and lookupTimeout: 5000ms
Input
{
  "src": "c0",
  "dest": "n1",
  "body": {
    "type": "init",
    "msg_id": 1,
    "node_id": "n1",
    "node_ids": [
      "n1"
    ]
  }
}
Expected Output
{"src":"n1","dest":"c0","body":{"type":"init_ok","in_reply_to":1,"msg_id":0}}

Hints

Hint 1
Partition index by key hash or range
Hint 2
Route point queries to single partition
Hint 3
Range queries need scatter-gather
OVERVIEW

Theoretical Hub

Index Partitioning

When an index outgrows one machine, partition it. Hash partitioning spreads load evenly. Range partitioning preserves locality for range queries but risks hot spots.

Global vs Local Secondary Index

Local secondary index: partitioned with data, requires scatter-gather for queries. Global secondary index: partitioned by index key, single lookup but writes update multiple partitions.

Key Concepts

partitioned indexglobal indexscatter-gather
main.py
python
Distribute Index Across Nodes - Indexes | Build Distributed Systems