Async

Module 1.4 - Token economics and context budgets

Every model call has a price measured in tokens, and tokens are both money and time. A token is roughly a few characters of text; the model reads your entire prompt as input tokens and writes its answer as output tokens, and you pay for both, on every single call. The context window is not free space to fill - it is a budget you spend.

Why it compounds

The choices from the previous modules all have a token price. A long system prompt costs its tokens on every request forever. Three few-shot examples multiply the input on every call. Verbose output costs output tokens and adds latency. In a notebook this is invisible; at ten thousand calls a day it is the difference between a viable product and one that loses money on every request.

Making cost legible

The habit this module installs is reading the token usage your endpoint already returns and connecting it directly to your choices. You added two few-shot examples and the input tokens jumped - that is a real recurring cost you can now weigh against the reliability they bought. Senior engineers design against the budget from the start rather than discovering the bill in month two.

Context as a scarce resource

There is a second dimension beyond cost: the window has a hard limit, and everything competes for it - your instructions, your examples, retrieved documents, conversation history. Treating context as scarce is what leads to good design later: retrieve only the relevant chunks rather than stuffing everything, summarize old history rather than carrying it forever. This module plants that mindset; Token Optimisation turns it into concrete technique.

Watch out

Common mistakes

  • Ignoring cost until it is a problem.
  • Carrying a bloated system prompt on every call.
  • Stuffing the whole document set into context.
  • Treating the window as free rather than as a budget with a hard ceiling.