TASK
Implementation
The fsync system call forces the OS to flush data from kernel buffers to the physical disk. Without it, data that appears "written" may only exist in volatile RAM buffers and will be lost on power failure.
The fundamental tradeoff: durability vs. throughput.
Three strategies, from safest to fastest:
- Always fsync: call fsync after every write. Every acknowledged entry is durable. Throughput limited by disk IOPS.
- Batch fsync: buffer writes and fsync every 10ms. Up to 10ms of writes can be lost on crash. 10-30x higher throughput.
- No fsync: let the OS decide when to flush. Crashes can lose seconds of data. 100x+ higher throughput.
Benchmark all three and measure ops/sec, then plot the durability vs. throughput curve.
Request: {"type": "fsync_benchmark", "msg_id": 1, "entries": 10000, "strategies": ["always", "batch_10ms", "none"]}
Response: {"type": "fsync_benchmark_ok", "in_reply_to": 1, "results": [
{"strategy": "always", "ops_per_sec": 500, "durability": "every_write", "data_loss_window": "0ms"},
{"strategy": "batch_10ms", "ops_per_sec": 15000, "durability": "every_10ms", "data_loss_window": "10ms"},
{"strategy": "none", "ops_per_sec": 100000, "durability": "os_dependent", "data_loss_window": "seconds"}
]}Sample Test Cases
Benchmark all three strategiesTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"fsync_benchmark","msg_id":2,"entries":100,"strategies":["always","batch_10ms","none"]}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
Single strategy benchmarkTimeout: 5000ms
Input
{"src":"c0","dest":"n1","body":{"type":"init","msg_id":1,"node_id":"n1","node_ids":["n1"]}}
{"src":"c1","dest":"n1","body":{"type":"fsync_benchmark","msg_id":2,"entries":50,"strategies":["always"]}}
Expected Output
{"src": "n1", "dest": "c0", "body": {"type": "init_ok", "in_reply_to": 1, "msg_id": 0}}
Hints
Hint 1▾
Always fsync: every write is durable on disk. Throughput is limited by disk IOPS (~500 ops/sec on HDD)
Hint 2▾
Batch fsync every 10ms: group writes and sync once per batch. Good balance — can lose up to 10ms of data
Hint 3▾
No fsync: let the OS buffer and flush when it wants. Highest throughput, but crashes can lose seconds of data
Hint 4▾
SSDs have much higher fsync throughput than HDDs (~10,000+ ops/sec)
Hint 5▾
Production systems like PostgreSQL offer wal_sync_method config to choose the strategy
OVERVIEW
Theoretical Hub
Concept overview coming soon
Key Concepts
fsyncdurabilitythroughput tradeoffbatch syncOS buffering
main.py
python
1
2
3
4
5
6
7
8
9
10
11
12
13
#!/usr/bin/env python3
import sys
import json
def main():
# Your implementation here
for line in sys.stdin:
msg = json.loads(line)
print(json.dumps(msg), flush=True)
if __name__ == "__main__":
main()