ARCHIVED from builddistributedsystem.com on 2026-04-28 — URL: https://builddistributedsystem.com/tracks/indexes/tasks/task-13-1-hash-index
TASK

Implementation

Build a hash index that maps keys to data file offsets:

  1. Hash each key to a bucket
  2. Store key and file offset in the bucket
  3. Lookup returns the offset for a key
  4. Handle collisions with chaining or probing

Hash indexes provide O(1) point lookups but cannot support range queries.

Sample Test Cases

Insert and lookupTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"index_put","msg_id":2,"key":"foo","value":100}}
{"src":"c2","dest":"n1","body":{"type":"index_get","msg_id":3,"key":"foo"}}
Expected Output
{"src":"n1","dest":"c0","body":{"type":"init_ok","in_reply_to":1,"msg_id":0}}
{"src":"n1","dest":"c1","body":{"type":"index_put_ok","in_reply_to":2,"msg_id":1}}
{"src":"n1","dest":"c2","body":{"type":"index_get_ok","in_reply_to":3,"msg_id":2,"value":100}}

Hints

Hint 1
Map keys to data locations
Hint 2
Handle hash collisions
Hint 3
Support insert, lookup, delete
OVERVIEW

Theoretical Hub

Why Indexing?

Without indexes, finding a record requires scanning all data - O(n) cost. Indexes provide shortcuts from keys to locations, reducing lookup to O(1) for hash or O(log n) for tree indexes.

Hash Index

Hash indexes map key -> file offset. They are extremely fast for exact matches. Bitcask, a log-structured store, uses hash indexes. The tradeoff: index must fit in memory, and no range queries.

Key Concepts

indexinghash tableO(1) lookup
main.py
python
Implement Hash Index - Indexes | Build Distributed Systems