Week 4 (Friday, shared with the Builders cohort): TRACE Evals

6 lessons · Back to full syllabus

What you keep

The full TRACE loop as a system you build - Trace, Read, Analyze, Codify, Enforce.

You ship

An eval suite your capstone runs against.

Lessons

Live

The vibe-check trap

"It works on the examples I tried" is not evaluation - and for non-deterministic systems it is dangerous.

Read lesson
Live

Trace and Read: error analysis

Capture full records, then read traces by hand - qualitative research on real failures.

Read lesson
Live

Analyze: cluster, count, prioritize

Binary pass/fail judgments, ranked by frequency and impact - the spec for your eval tooling.

Read lesson
Live

Codify: build the eval suite

Code-based assertions plus validated LLM-as-judge - an unvalidated judge is just another vibe check.

Read lesson
Live

Enforce: evals in the loop

Wire evals into the development loop so every change gets measured automatically.

Read lesson
Assignment

Build and run TRACE on your capstone

Trace, read, analyze, codify, and wire evals to run on demand - show a metric moving after a fix.

Read lesson