Distribute Index Across Nodes - Indexes | Build Distributed Systems

                      TASK
                    

Implementation

Distribute your index across multiple nodes:

Partition index entries by key (hash or range)
Point lookups go to single partition
Range queries scatter to all partitions, gather results
Handle partition rebalancing when nodes join/leave

Choose between local secondary index (partitioned with data) and global secondary index (partitioned by index key).

Sample Test Cases

Partitioned insert and lookupTimeout: 5000ms

Input

{
  "src": "c0",
  "dest": "n1",
  "body": {
    "type": "init",
    "msg_id": 1,
    "node_id": "n1",
    "node_ids": [
      "n1"
    ]
  }
}

Expected Output

{"src":"n1","dest":"c0","body":{"type":"init_ok","in_reply_to":1,"msg_id":0}}

Hints

Hint 1▾

Partition index by key hash or range

Hint 2▾

Route point queries to single partition

Hint 3▾

Range queries need scatter-gather

                      OVERVIEW
                    

Theoretical Hub

Index Partitioning

When an index outgrows one machine, partition it. Hash partitioning spreads load evenly. Range partitioning preserves locality for range queries but risks hot spots.

Global vs Local Secondary Index

Local secondary index: partitioned with data, requires scatter-gather for queries. Global secondary index: partitioned by index key, single lookup but writes update multiple partitions.

Key Concepts

partitioned indexglobal indexscatter-gather

main.py

python

#!/usr/bin/env python3
import sys
import json
import hashlib
class DistributedIndex:
    def __init__(self, num_partitions):
        self.num_partitions = num_partitions
        self.partitions = [{} for _ in range(num_partitions)]
    
    def _partition_for_key(self, key):
        h = int(hashlib.md5(str(key).encode()).hexdigest(), 16)
        return h % self.num_partitions
    
    def insert(self, key, value):
        # TODO: Route to correct partition
        pass
    
    def get(self, key):
        # TODO: Query single partition
        pass
    
    def range_query(self, start_key, end_key):
        # TODO: Scatter to all partitions, gather results
        pass
    
    def rebalance(self, new_num_partitions):
        # TODO: Redistribute keys to new partition count
        pass
if __name__ == "__main__":
    idx = DistributedIndex(4)