TASK
Implementation
Distribute your index across multiple nodes:
- Partition index entries by key (hash or range)
- Point lookups go to single partition
- Range queries scatter to all partitions, gather results
- Handle partition rebalancing when nodes join/leave
Choose between local secondary index (partitioned with data) and global secondary index (partitioned by index key).
Sample Test Cases
Partitioned insert and lookupTimeout: 5000ms
Input
{
"src": "c0",
"dest": "n1",
"body": {
"type": "init",
"msg_id": 1,
"node_id": "n1",
"node_ids": [
"n1"
]
}
}Expected Output
{"src":"n1","dest":"c0","body":{"type":"init_ok","in_reply_to":1,"msg_id":0}}Hints
Hint 1▾
Partition index by key hash or range
Hint 2▾
Route point queries to single partition
Hint 3▾
Range queries need scatter-gather
OVERVIEW
Theoretical Hub
Index Partitioning
When an index outgrows one machine, partition it. Hash partitioning spreads load evenly. Range partitioning preserves locality for range queries but risks hot spots.
Global vs Local Secondary Index
Local secondary index: partitioned with data, requires scatter-gather for queries. Global secondary index: partitioned by index key, single lookup but writes update multiple partitions.
Key Concepts
partitioned indexglobal indexscatter-gather
main.py
python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#!/usr/bin/env python3
import sys
import json
import hashlib
class DistributedIndex:
def __init__(self, num_partitions):
self.num_partitions = num_partitions
self.partitions = [{} for _ in range(num_partitions)]
def _partition_for_key(self, key):
h = int(hashlib.md5(str(key).encode()).hexdigest(), 16)
return h % self.num_partitions
def insert(self, key, value):
# TODO: Route to correct partition
pass
def get(self, key):
# TODO: Query single partition
pass
def range_query(self, start_key, end_key):
# TODO: Scatter to all partitions, gather results
pass
def rebalance(self, new_num_partitions):
# TODO: Redistribute keys to new partition count
pass
if __name__ == "__main__":
idx = DistributedIndex(4)