Implement WAL Compaction with Atomic Snapshot - The Logger

                      TASK
                    

Implementation

Without compaction, the WAL grows forever. Every database solves this with the same pattern: periodically take a snapshot of the state machine, then truncate all WAL entries before the snapshot.

The compaction algorithm:

Serialize the current state machine to a snapshot file
Record the WAL offset at which the snapshot was taken
Delete all WAL entries (segments) before that offset
On recovery: load the snapshot, then replay only entries after the snapshot offset

The critical requirement: atomicity. If the system crashes between taking the snapshot and truncating the WAL, there must be no data loss. The standard approach:

Write snapshot to a temp file
Atomically rename the temp file to the final snapshot path
Only then delete old WAL segments

Request:  {"type": "wal_snapshot", "msg_id": 1}
Response: {"type": "wal_snapshot_ok", "in_reply_to": 1, "snapshot_offset": 500, "snapshot_size_bytes": 2048}

Request:  {"type": "wal_compact", "msg_id": 2, "snapshot_at_offset": 500}
Response: {"type": "wal_compact_ok", "in_reply_to": 2, "snapshot_offset": 500, "entries_before": 1000, "entries_after": 500, "bytes_freed": 32000}

Sample Test Cases

Take a snapshot of current stateTimeout: 5000ms

Input

{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"wal_append","msg_id":2,"payload":"set x=1"}}
{"src":"c1","dest":"n1","body":{"type":"wal_append","msg_id":3,"payload":"set y=2"}}
{"src":"c1","dest":"n1","body":{"type":"wal_snapshot","msg_id":4}}

Expected Output

{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}

Compaction frees old entriesTimeout: 5000ms

Input

{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"wal_compact","msg_id":2,"snapshot_at_offset":500}}

Expected Output

{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}

Hints

Hint 1▾

Once a snapshot of the state machine is taken, old WAL entries become unnecessary

Hint 2▾

The process: take snapshot -> record snapshot offset -> delete WAL entries before that offset

Hint 3▾

The snapshot + truncation MUST be atomic: a crash between them means data loss

Hint 4▾

Write the snapshot to a temp file, then atomically rename it into place

Hint 5▾

On recovery: load snapshot first, then replay only WAL entries after the snapshot offset

Resources

Log Compaction in Kafka

How Kafka implements log compaction to reclaim disk space while keeping latest values

                      OVERVIEW
                    

Theoretical Hub

Concept overview coming soon

Key Concepts

log compactionsnapshottruncationatomic operationspace reclamation

main.py

python