TASK
Implementation
When data is sharded by a primary key (e.g., user_id), querying by a secondary key (e.g., email) requires a secondary index. There are two main strategies.
Local secondary index:
- Each shard maintains an index for its local data only
- Query by email must be sent to ALL shards (scatter-gather)
- Simple to implement but expensive for queries
Example:
Request: {"type": "secondary_index_query", "msg_id": 1, "index": "email", "value": "alice@example.com"}
Response: {"type": "secondary_index_query_ok", "in_reply_to": 1, "results": [{"user_id": "u42", "email": "alice@example.com"}], "shards_scanned": 3}Global secondary index:
- A separate index shard that maps email → primary_key (user_id)
- Query by email first looks up the index shard, then fetches from the data shard
- Faster queries but more complex architecture
Example with global index:
Request: {"type": "secondary_index_query", "msg_id": 2, "index": "email", "value": "bob@example.com", "use_global": true}
Response: {"type": "secondary_index_query_ok", "in_reply_to": 2, "results": [{"user_id": "u99", "email": "bob@example.com"}], "shards_scanned": 1}Write amplification:
When a user updates their email:
- Update primary record on shard hash(user_id)
- Update secondary index (either local or global)
- Two network round-trips instead of one
Implementation:
- Maintain an index map:
Map<index_name, Map<index_value, primary_key>> - For local indexes: each shard has its own index map
- For global indexes: a dedicated index shard
- On write: update both the record and all secondary indexes
Sample Test Cases
Local secondary index lookupTimeout: 5000ms
Input
{"src":"c0","dest":"coord","body":{"type":"init","msg_id":1,"shards":["s1","s2","s3"]}}
{"src":"c1","dest":"coord","body":{"type":"secondary_index_query","msg_id":2,"index":"email","value":"alice@example.com"}}
Expected Output
{"src": "coord", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
Global secondary index lookupTimeout: 5000ms
Input
{"src":"c0","dest":"coord","body":{"type":"init","msg_id":1,"shards":["s1","s2","s3"]}}
{"src":"c1","dest":"coord","body":{"type":"secondary_index_query","msg_id":2,"index":"email","value":"bob@example.com","use_global":true}}
Expected Output
{"src": "coord", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
Hints
Hint 1▾
Primary index: data is partitioned by user_id
Hint 2▾
Secondary index on email: need to lookup by email without knowing user_id
Hint 3▾
Local secondary index: each shard maintains an index for its local data only
Hint 4▾
Global secondary index: a separate index shard that maps email → user_id
Hint 5▾
Write amplification: updating a document requires updating both primary and secondary indexes
OVERVIEW
Theoretical Hub
Concept overview coming soon
Key Concepts
secondary indexesglobal indexeslocal indexesindex shardingwrite amplificationconsistency
main.py
python
1
2
3
4
5
6
7
8
9
10
11
12
13
#!/usr/bin/env python3
import sys
import json
def main():
# Your implementation here
for line in sys.stdin:
msg = json.loads(line)
print(json.dumps(msg), flush=True)
if __name__ == "__main__":
main()