TASK
Implementation
Build a hash index that maps keys to data file offsets:
- Hash each key to a bucket
- Store key and file offset in the bucket
- Lookup returns the offset for a key
- Handle collisions with chaining or probing
Hash indexes provide O(1) point lookups but cannot support range queries.
Sample Test Cases
Insert and lookupTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"index_put","msg_id":2,"key":"foo","value":100}}
{"src":"c2","dest":"n1","body":{"type":"index_get","msg_id":3,"key":"foo"}}
Expected Output
{"src":"n1","dest":"c0","body":{"type":"init_ok","in_reply_to":1,"msg_id":0}}
{"src":"n1","dest":"c1","body":{"type":"index_put_ok","in_reply_to":2,"msg_id":1}}
{"src":"n1","dest":"c2","body":{"type":"index_get_ok","in_reply_to":3,"msg_id":2,"value":100}}
Hints
Hint 1▾
Map keys to data locations
Hint 2▾
Handle hash collisions
Hint 3▾
Support insert, lookup, delete
OVERVIEW
Theoretical Hub
Why Indexing?
Without indexes, finding a record requires scanning all data - O(n) cost. Indexes provide shortcuts from keys to locations, reducing lookup to O(1) for hash or O(log n) for tree indexes.
Hash Index
Hash indexes map key -> file offset. They are extremely fast for exact matches. Bitcask, a log-structured store, uses hash indexes. The tradeoff: index must fit in memory, and no range queries.
Key Concepts
indexinghash tableO(1) lookup
main.py
python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#!/usr/bin/env python3
import sys
import json
import os
class HashIndex:
def __init__(self, data_file="data.log"):
self.index = {} # key -> file offset
self.data_file = data_file
self.write_pos = 0
def put(self, key, value):
# TODO: Append to data file, update index
pass
def get(self, key):
# TODO: Lookup offset in index, read from file
pass
def delete(self, key):
# TODO: Write tombstone, remove from index
pass
def compact(self):
# TODO: Merge duplicate keys, reclaim space
pass
if __name__ == "__main__":
idx = HashIndex()
idx.put("foo", "bar")
print(idx.get("foo"))