Synkti Fleet Experience Interview

This chapter collects interview-safe answers for the synkti-fleet project. The goal is to describe the system in a way that is compact, concrete, and reusable in live interviews.

Core Project Framing

I built a Rust-based spot inference orchestration system for deploying OpenAI-compatible chat serving into AWS using GPU spot instances. The architecture used an always-on observer or orchestrator node as the stable control point, while worker nodes were launched dynamically as spot instances and discovered through EC2 tags instead of a traditional load balancer. The observer reconciled the fleet toward a fixed target worker count, replaced interrupted workers, routed requests to healthy workers, and handled the operational complexity of dynamic infrastructure. A big part of the design was reducing worker cold-start pain: new workers bootstrapped themselves by installing the required dependencies, pulling the vLLM image, downloading model assets, and warming into a state where they could serve chat traffic through a stable interface.

Tell Me About The Project

I built a Rust-based spot inference orchestration system for deploying OpenAI-compatible chat serving into AWS using GPU spot instances. The system used an always-on observer EC2 instance as the control plane and dynamic worker instances as the data plane. The observer reconciled the fleet toward a fixed worker count, discovered workers through EC2 tags, monitored spot interruptions, and routed requests to healthy workers. On the worker side, the bootstrap process installed the dependencies needed for serving, pulled the vLLM container, downloaded model assets, and brought each node to a ready state before it started taking traffic. The result was a system that could keep a spot-backed inference fleet stable enough to expose through a normal chat API.

What Was Your Contribution

My main contribution was the system architecture and orchestration logic. I built the Rust control path that handled fleet reconciliation, worker discovery, request routing, spot lifecycle management, and worker bootstrap behavior. I also shaped the operational model around explicit infrastructure configuration from SSM, tag-based discovery instead of hidden state, and a worker lifecycle that was concrete enough to reason about under spot churn.

What Was Technically Difficult

The hardest part was that the system had to coordinate infrastructure, process lifecycle, and request serving at the same time. Spot workers have dynamic identities and can disappear at any point, so the observer had to keep the fleet near a target count, avoid routing into unhealthy or terminating workers, and treat worker state as something continuously reconciled rather than assumed. The other difficult part was cold start: a worker was not useful just because EC2 said it was running. It still had to install tooling, authenticate to ECR, pull containers, download model assets, and warm into a serving state before it could safely receive traffic.

Why Rust For This Project

Rust fit well because the problem was fundamentally about explicit state, failure boundaries, and systems coordination. I needed the control plane to model worker lifecycle, reconciliation, routing, and shutdown behavior clearly rather than as a tangle of scripts and hidden assumptions. Rust gave me a good balance of strong safety, explicit error handling, and enough systems-level control to make the infrastructure behavior legible.

How Did You Use Async Rust

I used async Rust for the observer and control-plane side of the system. That included AWS API calls, health and lifecycle monitoring, fleet reconciliation, and request forwarding from the stable orchestrator endpoint to dynamic workers. The async model was useful because the observer had to watch many independent things at once, including worker state, spot events, health checks, and incoming inference requests, without turning the control path into blocking orchestration.

How Did You Handle Shared State And Coordination

The main coordination model was reconciliation rather than assuming a fixed cluster state. The observer continuously compared desired capacity against discovered worker state and then launched, replaced, or drained workers as needed. I also used explicit tagging and typed infrastructure configuration so worker identity, cluster membership, and control-plane decisions were based on visible state rather than implicit conventions.

How Did You Keep The System Reliable

Reliability came from making the lifecycle explicit. The observer was always-on, workers tagged themselves for discovery, spot termination was monitored directly, and the routing path selected healthy workers rather than assuming a static pool. On the bootstrap side, I treated cold start as a real systems problem: the worker was only considered useful after the serving dependencies, container image, and model assets were in place. That kept the API-facing side of the system more stable even though the underlying compute was preemptible.

What Was The Core Design Insight

The key design insight was separating stable control from unstable compute. The observer instance provided a stable coordination and routing layer, while the worker fleet remained disposable and replaceable. That let me embrace spot economics without pretending the workers themselves were stable infrastructure.

How Did You Improve The Original Design

The important improvement was moving from a vague idea of “run inference on spot” to an explicit control plane with clear responsibilities: reconciliation, discovery, failover, routing, and cold-start handling. Instead of treating instance launch as the end of the problem, I treated it as the beginning of a worker lifecycle that had to be managed all the way through readiness and interruption.

What Did You Learn From It

The project taught me that scalable infrastructure is often a state-machine problem before it is a throughput problem. In this system, the real challenge was not just getting a model to answer. It was making worker states, warm-up phases, routing boundaries, and interruption handling explicit enough that the system stayed understandable under change. It also taught me to think more concretely about the economics of infrastructure, because part of the outcome was validating where spot-backed serving was operationally sound and where it stopped making economic sense.

How Would You Describe It In One Sentence

I built a Rust control plane for GPU spot inference that used an always-on observer node to keep a dynamic worker fleet at target capacity, warm workers into readiness, and route OpenAI-compatible chat traffic to healthy instances.

Short HR-Friendly Version

I built a Rust-based backend system that used AWS spot instances to serve AI inference more efficiently, while an always-on observer node kept the worker fleet stable, replaced interrupted instances, and exposed a reliable chat interface to clients.

Short Technical Version

I built a Tokio-based orchestration system that reconciled GPU spot workers to a fixed target count, discovered workers via EC2 tags, monitored interruption events, warmed workers by installing serving dependencies and model assets, and forwarded OpenAI-compatible chat requests to healthy vLLM nodes.

If Asked What You Are Proud Of

What I am most proud of is that I treated the system as a real infrastructure control problem instead of just an inference demo. The project had a clear model for worker lifecycle, failure handling, request routing, and readiness, and that made it possible to reason about spot churn, warm-up cost, and operational tradeoffs much more concretely.

Keyboard shortcuts

Bobby Interview Notes