Async

Module 1.1 - Build your first LLM service

The whole course is about making an unpredictable component reliable, and you cannot make something reliable until it exists as a running thing you can poke, measure, and break. Before any theory, you stand up a real service: an HTTP endpoint that takes a request, calls a model, and returns a response. Everything else this week - prompting, guardrails, cost, model choice - is a way of improving this one living thing.

What "a service" actually means

A model call in a notebook is an experiment. A service is software other things can depend on: it has a defined input, a defined output, and it runs somewhere reachable. The moment you wrap a model behind an endpoint, you inherit every concern of real software - what happens on bad input, on a timeout, on a malformed response - and that is exactly the point. The reliability work only becomes real once there is a contract to keep.

The stack, and why FastAPI

You will use FastAPI to expose the endpoint. It matters here for one reason above all: typed request and response models. In FastAPI you declare the shape of what comes in and what goes out (using Pydantic models), and the framework validates against those shapes automatically. That is your first line of reliability - the boundaries of your service are defined and enforced, not assumed. A request that does not match the shape is rejected before it ever reaches your model call, and a response is checked before it leaves.

The coding-agent-first workflow

You will not hand-type the boilerplate. You direct your coding agent (Cursor or Claude Code) to scaffold the FastAPI app, the endpoint, the model client, and the environment handling - and then you read every line and own it. The agent writes the code; you supply the judgment. The skill you are building is not "can I remember FastAPI syntax" - it is "can I direct a build and verify it is correct." Read the generated code, understand each part, and change what you do not agree with.

What good looks like by the end

A running `POST` endpoint that accepts a question, calls a model, and returns a structured response including the tokens used. It handles a missing field gracefully, it does not crash on an empty question, and you can call it from your terminal and get a predictable shape back every time. That predictability is the thing you just engineered - and it is the foundation the rest of Week 1 builds on.

Watch out

Common mistakes

  • Treating the endpoint as the achievement (it is the starting line).
  • Skipping the typed response model (you lose your first guardrail).
  • Letting the agent's code go unread (you cannot own what you do not understand).