Systems Problems and Solutions
This chapter is for advanced systems problems where the real challenge is not only syntax. The challenge is coordinating memory, disk, concurrency, and recovery while keeping the design explainable.
The goal is to recognize a few recurring high-level patterns:
- publish new state atomically
- separate read paths from write paths
- stage changes before commit
- use logs for durability and recovery
- use snapshots to avoid blocking readers
- compact history so recovery does not become unbounded
1. Durable Atomic Store With WAL
Problem shape
Build a store where a transaction is only considered committed if it is safely written to a write-ahead log first. If the process crashes, the state must be recoverable from the log.
Core design
A clean order is:
- serialize the transaction
- append it to the WAL
sync_all()so the log is durable on disk- apply the change to an in-memory
Arcsnapshot - publish the new snapshot atomically
The important rule is:
Disk before memory publication.
If the process crashes after the WAL is durable but before the new in-memory snapshot is published, recovery can replay the log. If memory were updated first, a crash could lose acknowledged data.
Type shape
#![allow(unused)] fn main() { use std::collections::HashMap; use std::fs::File; use std::sync::{Arc, Mutex}; struct WalletDb { state: Arc<Mutex<Arc<HashMap<String, i32>>>>, log_file: Mutex<File>, } }
Here:
Arc<HashMap<...>>is the published in-memory snapshot- the outer
Mutexprotects pointer publication - the file handle is protected separately for serialized appends
Why interviewers like it
This problem forces you to explain:
- durability versus visibility
- why
sync_all()is expensive but necessary - crash points and recovery order
- why readers should stay on old snapshots while writers prepare a new one
Interview-safe summary
A WAL-based store makes the log the durable source of truth and the
Arcsnapshot the fast in-memory view. The commit order is append, fsync, apply, publish. That preserves durability and lets recovery rebuild the latest consistent state after a crash.
2. Sharded 2PC With Arc
Problem shape
Transfer data atomically across two shards without corrupting state if one side fails.
Core design
A good Arc-based two-phase-commit shape is:
- lock the involved shards in a fixed order
- clone the current shard snapshots
- use
Arc::make_mutto stage private changes - validate everything during prepare
- if validation succeeds, swap the master pointers to commit
- if validation fails, drop the staged copies
That gives a strong property:
rollback is implicit because unpublished staged copies are dropped.
Type shape
#![allow(unused)] fn main() { use std::collections::HashMap; use std::sync::{Arc, Mutex}; type ShardData = HashMap<String, i32>; type Shard = Arc<Mutex<Arc<ShardData>>>; }
Why this is advanced
This problem combines:
- lock ordering for deadlock prevention
- staging versus commit
- atomic pointer publication
- dirty-shard-only cloning
- readers continuing on old snapshots while writers prepare new state
Interview-safe summary
In a sharded
Arcdesign, the prepare phase happens in private staged copies and the commit phase is a pointer swap. That makes rollback cheap and lets readers stay on stable snapshots while writers stage multi-shard updates.
3. Async 2PC With Tokio
Problem shape
Now add async logging or network I/O inside the commit path.
The system must hold transactional control across .await points without blocking the executor.
Core design
The async version keeps the same snapshot and staging logic, but replaces blocking synchronization with async-aware primitives.
#![allow(unused)] fn main() { use std::collections::HashMap; use std::sync::Arc; use tokio::sync::Mutex; type ShardData = HashMap<String, i32>; type Shard = Arc<Mutex<Arc<ShardData>>>; }
Why tokio::sync::Mutex instead of std::sync::Mutex?
std::sync::Mutexblocks the executor thread if held across waitingtokio::sync::Mutexlets the task yield while waiting for the lock- this matters when the transaction must
.awaitlogging or RPC work before commit
Why this is a level up
Now the design must satisfy all the earlier Arc and 2PC constraints plus:
- async lock acquisition
- holding the commit gate across
.await - keeping tasks movable enough for Tokio scheduling
- avoiding executor-thread blocking during slow I/O
Interview-safe summary
Async 2PC keeps the same staging-and-publish model, but the commit gate becomes an async mutex because the transaction may need to await disk or network work before publication. The goal is still the same: stable reader snapshots, explicit writer staging, and atomic publication.
4. Distributed WAL With Quorum
Problem shape
Build a replicated key-value store across three nodes where a write is committed only after a quorum of nodes durably records it. Readers must always see the latest committed local snapshot without waiting on network operations.
Core design
A clean high-level flow is:
- create a log entry
- replicate it to peers
- wait for quorum success
- once quorum is reached, apply the entry to each node's in-memory snapshot
- let readers continue using their current
Arcsnapshots during replication
Replica type shape
#![allow(unused)] fn main() { use std::collections::HashMap; use std::sync::Arc; use tokio::sync::{Mutex, RwLock}; struct Replica { state: Arc<RwLock<Arc<HashMap<String, i32>>>>, wal: Mutex<Vec<LogEntry>>, } }
This separates concerns cleanly:
- WAL: durability history
Arcsnapshot: fast committed read viewRwLock: cheap read access to the current published pointer
Why this is advanced
This problem forces you to coordinate:
- local durability
- quorum logic
- local publication order
- non-blocking reads during replication
- recovery from each node's WAL
Interview-safe summary
In a quorum store, the WAL is the durable agreement surface and the
Arcsnapshot is the local committed read view. Replication decides whether the write is committed; snapshot publication decides when that committed value becomes visible in memory.
5. Log Compaction
Problem shape
A WAL that grows forever eventually makes restart time and storage cost unacceptable. The system needs to compact history into a snapshot and discard old log entries.
Core design
A compacted recovery model is:
- capture the current
Arcsnapshot - serialize it as a checkpoint file
- record the last included log index or version
- truncate or rotate old WAL segments before that point
- on recovery, load the snapshot first, then replay only the tail of the WAL
Why Arc helps
Because the snapshot can be cloned cheaply, compaction can work from a stable frozen view while the live system continues moving forward.
Interview-safe summary
Log compaction turns a long event history into a state checkpoint plus a short tail. The WAL gives immediate durability, the snapshot gives fast restart, and
Arclets checkpointing work from a stable view without stopping readers.
6. Incremental Snapshots
Problem shape
Full snapshots may be too expensive if the database is large and only a small fraction changes between checkpoint cycles.
Core design
Instead of rewriting the whole state every time:
- track dirty keys or dirty ranges
- atomically swap out the dirty set for a fresh empty one
- capture a current snapshot of only those changed keys
- persist a delta batch
- periodically merge many deltas into a new base snapshot
A high-yield optimization here is std::mem::take on the dirty set, because it minimizes lock hold time before slow disk work starts.
Why this matters
This reduces:
- disk write volume
- checkpoint latency
- interference with the hot path
But it increases recovery complexity because the system now replays a base snapshot plus a chain of deltas.
Interview-safe summary
Incremental snapshots trade simpler recovery for much lower write volume. The system checkpoints only what changed, then periodically folds those deltas back into a new base snapshot so recovery does not require replaying an unbounded chain.
7. Hot-Join Distributed Recovery
Problem shape
Add a new empty follower to a live cluster while the leader is still processing writes. The leader must stream a large snapshot without pausing foreground traffic, and the follower must become active without missing any transactions that happened during the transfer.
This is harder than ordinary recovery because the node is not recovering from its own disk alone. It is catching up from a moving live source.
Core design
A strong high-level protocol is:
- capture a point-in-time
Arcsnapshot on the leader - record the snapshot index or last included log sequence number
- stream that frozen snapshot to the follower in chunks
- buffer or stream all post-snapshot log entries separately
- once the snapshot is fully loaded, replay every entry after the snapshot index
- perform an atomic handover from syncing mode to active mode
The key invariant is:
the follower must know exactly where the snapshot stops and where live replay begins.
Without that boundary, the system cannot prove that it closed the gap between past state and present writes.
Why Arc helps
Arc is what makes the leader-side snapshot cheap enough to take under load.
- the leader briefly clones the current published
Arcsnapshot - the long-running streaming task holds that frozen view
- foreground writers continue preparing and publishing newer versions independently
That means the leader does not have to stop writes during the transfer.
Type and protocol shape
A useful mental model is:
- published state:
Arc<RwLock<Arc<HashMap<K, V>>>> - live write stream:
tokio::sync::mpscor an indexed log subscription - replay boundary:
AtomicU64or some explicit monotonically increasing sequence number - chunk transfer: async reader/writer with bounded buffering
What the follower must do
The follower has two jobs at once while syncing:
- receive and assemble the old snapshot
- collect all newer writes that happen during that transfer window
That is why a tokio::select! style receive loop is a natural fit:
- one branch handles snapshot chunks
- one branch handles buffered live transactions
- completion happens only after the snapshot is installed and the buffered tail is replayed
More advanced interview angles
Once the basic answer is in place, these are the stronger follow-up angles:
- Snapshot index pinning: The leader should tag the snapshot with the exact last included index or term so the follower knows where replay starts.
- Idempotent replay: Replayed transactions should be safe to apply more than once, or the follower should deduplicate by sequence number.
- Bounded buffering: If the follower takes minutes to load the snapshot, an in-memory delta buffer may grow without bound. A stronger design spills the tail to a WAL or requires a resumable indexed log stream.
- Backpressure: Snapshot transfer must slow down when the follower or network is slow instead of allowing unbounded RAM growth.
- Chunk integrity: Chunks should carry checksums, sequence numbers, and finalization markers so corruption or missing chunks are detectable.
- Resumable transfer: If the follower disconnects halfway through 20GB, the protocol should resume from a known chunk or restart from a known snapshot generation.
- Membership epoch or term: The follower should reject stale transfers from an old leader term.
- Leader log retention: The leader must not compact away the tail that the follower still needs before catch-up is finished.
- Atomic cutover: The follower should switch from syncing mode to active serving only after snapshot installation and tail replay are both complete.
- Read policy while syncing: Decide whether the follower serves no reads, stale reads only, or reads only after cutover.
- Failure during handover: If the follower crashes after loading the snapshot but before finishing replay, recovery should resume from durable local metadata rather than restart from scratch.
Why this is a serious systems problem
This problem forces you to coordinate:
- stable published memory state
- a moving write stream
- network transfer of a large snapshot
- replay boundaries and sequence tracking
- mode transition from syncing to active service
It is not just a replication problem. It is a staging-and-publication problem stretched across machines.
Interview-safe summary
Hot-join recovery works by freezing a point-in-time snapshot, tagging it with a replay boundary, streaming it to the new follower, and then replaying every later write before cutover.
Arcmakes the frozen leader view cheap to hold while foreground writes continue, and the real correctness issue is proving that the follower closed the gap between snapshot time and live time without missing or duplicating entries.
8. SQL In Systems Interviews
Short answer
For many top backend and systems-adjacent interviews, yes, SQL matters. Not always as a separate database round, but often as part of reasoning about storage, querying, transactions, consistency, and production behavior.
A good rule is:
- backend roles: usually yes
- storage, infra, and database-adjacent roles: definitely yes
- distributed systems roles: often yes, especially when metadata stores, job state, or operational tooling are involved
- low-level kernel or compiler roles: less central, but still useful background
What interviewers usually care about
In stronger interviews, SQL is usually not about obscure syntax trivia. It is about whether you can reason about how structured data is queried and maintained under real workloads.
The highest-yield areas are:
SELECT,WHERE,ORDER BY,LIMITJOINreasoningGROUP BYand aggregation- basic window-function awareness
- indexes and query shape
- transactions and atomicity
- isolation-level intuition
- schema design and key choice
- when SQL is the right tool and when it is not
Why this belongs in systems prep
SQL shows up in systems interviews because many system components eventually touch durable relational state:
- account and billing records
- metadata services
- job queues and schedulers
- audit logs and reporting tables
- control-plane state
- internal tooling and observability stores
Even if the hot path is not SQL-backed, the surrounding control plane often is.
Interview-safe topics to be able to explain
1. Query shape
You should be comfortable writing and reading queries like:
- fetch one row by key
- join two or three tables
- aggregate counts or sums by group
- find top-N rows by some metric
- filter by time range
- compute rolling or partitioned values with window functions at a basic level
2. Index intuition
A good compact explanation is:
An index is a trade-off: faster reads for certain query shapes in exchange for extra memory, disk, and write cost.
You should be able to explain:
- why a full table scan may be slow
- why an index helps only for matching query patterns
- why too many indexes hurt write-heavy systems
- why composite index order matters
3. Transactions
A minimal strong explanation is:
A transaction groups operations so they succeed or fail as one unit and are isolated from conflicting concurrent work according to the database's isolation model.
You should be able to connect that to:
- transfers between accounts
- race conditions
- partial failure
- retry behavior
- uniqueness and consistency guarantees
4. Isolation intuition
You do not need to be a database theorist, but you should know the general shape:
- weaker isolation improves concurrency but allows more anomalies
- stronger isolation prevents more anomalies but costs more coordination
- real systems often choose isolation based on workload and correctness requirements
5. Schema design
You should be able to discuss:
- primary keys
- foreign keys
- normalization versus denormalization
- append-only versus mutable tables
- how query patterns should influence schema shape
Interview-safe summary
In top backend and systems interviews, SQL matters because it is the language of structured durable state. The goal is not to memorize every clause. The goal is to understand query shape, indexes, transactions, and schema trade-offs well enough to reason about real production systems.
The Big Pattern
All of these problems are variations of the same deeper system shape:
- log or stage first
- publish atomically second
- let readers stay on stable snapshots
- let recovery rebuild state from durable history
That is why Arc shows up repeatedly in advanced systems work.
It is not only a shared pointer.
It is often the mechanism that makes stable published views possible while the next version is being prepared elsewhere.
Interview-Safe Closing
In advanced systems problems, the core design question is often how to separate staging from publication. Logs, locks, snapshots, and recovery all orbit that same boundary.
Arcis powerful in these designs because it gives a cheap, stable published view while writers prepare the next version privately.