Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

System Roles and Abstract Shapes

This chapter is the deeper unification layer behind the scratch-building problems.

The previous chapter unified the vocabulary. This chapter unifies the abstract roles that those systems are actually made of.

The goal is to stop seeing ten separate interview problems and start seeing a small number of recurring system roles.

The Core Roles

1. Identity

This is the question:

Who or what is this state about?

Examples:

  • user ID
  • API key
  • IP address
  • tenant
  • connection ID
  • cache key

Why it matters:

Many backend systems are not global. They are per-identity systems.

If there is no identity, there is often no way to partition, track, or limit behavior cleanly.

2. Stored State

This is the question:

What do I need to remember between events?

Examples:

  • request count
  • cached value
  • retry count
  • last-seen timestamp
  • token count
  • active subscriptions

Why it matters:

Most backend systems are not just transformations of one input to one output. They depend on remembered state across time.

Rust type references:

  • when the state is keyed by identity, the default Rust type is often HashMap<K, V>
  • when the state is a single shared aggregate, it may just be a struct with fields
  • when ordering matters, BTreeMap<K, V> may be a better fit than HashMap<K, V>

3. Pending Work

This is the question:

What has been accepted but not yet processed?

Examples:

  • queued jobs
  • pending retries
  • buffered log lines
  • events waiting to be handled

Why it matters:

Any time work does not happen instantly, a system needs a way to hold pending work.

Rust type references:

  • VecDeque<T> is the direct in-memory queue type
  • std::sync::mpsc channels model queued work across threads
  • tokio::sync::mpsc models queued work across async tasks

4. Execution Unit

This is the question:

What actually performs the work?

Examples:

  • thread
  • async task
  • worker
  • background consumer

Why it matters:

A lot of design questions reduce to how work gets attached to execution.

5. Time Boundary

This is the question:

What changes as time passes?

Examples:

  • rate-limit window expiration
  • cache TTL
  • retry delay
  • rolling metrics window
  • token refill
  • flush interval

Why it matters:

Many backend systems are state plus time. Time is not just metadata; it changes the validity of the state.

Rust type references:

  • std::time::Instant is the right type for monotonic elapsed-time measurement
  • std::time::Duration represents intervals and limits

6. Capacity Boundary

This is the question:

What is allowed to grow, and what must stay bounded?

Examples:

  • queue length
  • pool size
  • cache size
  • number of retries
  • batch size

Why it matters:

A backend system without capacity boundaries often becomes a memory or latency problem.

Rust type references:

  • capacity often appears as a usize
  • bounded channels, bounded pools, and bounded caches all turn capacity into explicit program state

7. Admission Boundary

This is the question:

What gets allowed in, and what gets rejected or delayed?

Examples:

  • rate limiter allow/reject
  • queue full / backpressure
  • connection acquire timeout
  • cache admission rules

Why it matters:

The system needs rules for deciding what enters and what does not.

8. Delivery Boundary

This is the question:

How does something move from one part of the system to another?

Examples:

  • channel send/receive
  • event publish/subscribe
  • queue dispatch
  • batch flush

Why it matters:

This is where ordering, fan-out, buffering, and backpressure show up.

Rust type references:

  • Sender<T> / Receiver<T> pairs are the standard typed delivery boundary in channel-based designs
  • VecDeque<T> is the local delivery boundary when work stays in one process component

9. Resource Reuse

This is the question:

What expensive thing should be reused instead of recreated?

Examples:

  • database connections
  • workers
  • pooled clients
  • cached values

Why it matters:

Reuse is often the difference between a toy design and a real backend design.

Rust type references:

  • reused resources are often stored in Vec<T>, VecDeque<T>, or maps keyed by identity depending on how they are acquired and returned

10. Lifecycle State

This is the question:

What phases can this thing be in?

Examples:

  • pending / running / failed / succeeded
  • active / expired
  • available / acquired
  • subscribed / disconnected

Why it matters:

When the system has phases, modeling them explicitly reduces ambiguity.

Mapping the 10 Problems onto the Roles

1. Rate Limiter

  • identity: user, key, IP, token
  • stored state: count, current window, token count
  • time boundary: window expiration, refill timing
  • admission boundary: allow vs reject
  • capacity boundary: per-key memory growth

2. Worker Pool / Job Queue

  • pending work: queued jobs
  • execution unit: worker threads or async tasks
  • delivery boundary: queue to worker handoff
  • capacity boundary: queue size
  • lifecycle state: submitted, running, finished, shutdown

3. In-Memory Cache

  • identity: key
  • stored state: cached value
  • capacity boundary: max items
  • time boundary: TTL expiration
  • resource reuse: reused data instead of recomputation

4. Event Bus / Pub-Sub

  • delivery boundary: producer to subscribers
  • fan-out: one event to many consumers
  • stored state: subscriber list
  • lifecycle state: active or dead subscribers

5. Retry Queue / Task Scheduler

  • pending work: failed tasks waiting to retry
  • time boundary: retry delay
  • lifecycle state: pending, retrying, dead-lettered
  • admission boundary: max retries or drop policy

6. Rolling Metrics Aggregator

  • stored state: counters or buckets
  • time boundary: rolling windows
  • capacity boundary: retained history
  • aggregation: combine many events into summaries

7. Connection Pool

  • resource reuse: reusable connections
  • capacity boundary: pool size
  • admission boundary: acquire or timeout
  • lifecycle state: available, acquired, broken

8. LRU Cache

  • identity: key
  • stored state: cached values
  • capacity boundary: maximum cache size
  • lifecycle state: recently used vs old
  • eviction: remove least recently used

9. Token Bucket

  • identity: key
  • stored state: token count
  • time boundary: refill over time
  • admission boundary: consume token or reject
  • burst control: temporary spikes allowed within a bounded model

10. Log Batcher / Buffered Writer

  • pending work: buffered log entries
  • delivery boundary: flush to sink
  • capacity boundary: batch size
  • time boundary: flush interval
  • throughput/latency trade-off: larger batches vs faster flush

The Deep Compression

Most backend interview problems reduce to combinations of these:

  • identity
  • remembered state
  • pending work
  • execution
  • time
  • capacity
  • admission
  • delivery
  • reuse
  • lifecycle

That is a stronger unification than just data structures or code snippets.

How To Use This in an Interview

When you hear a new problem, ask:

  1. What is the identity here?
  2. What state must be remembered?
  3. Is there pending work?
  4. What is the execution model?
  5. What changes over time?
  6. What must stay bounded?
  7. What gets admitted, delayed, or rejected?
  8. How is work delivered?
  9. What should be reused?
  10. What lifecycle states exist?

If you answer those cleanly, the data structures usually follow naturally.

Very often the first Rust types that fall out of those answers are:

  • HashMap<K, V> for per-identity state
  • VecDeque<T> for pending ordered work
  • Sender<T> / Receiver<T> for delivery across workers or tasks
  • Instant and Duration for time-aware behavior
  • structs and enums for lifecycle and policy state

Short Interview Framing

I usually try to reduce backend problems to a few abstract roles: identity, stored state, pending work, execution, time boundaries, capacity boundaries, admission rules, delivery rules, reuse, and lifecycle state. Once those are clear, the implementation choices become much easier to reason about.