Week 4 (Friday, shared with the Builders cohort): TRACE Evals
6 lessons · Back to full syllabus
What you keep
The full TRACE loop as a system you build - Trace, Read, Analyze, Codify, Enforce.
You ship
An eval suite your capstone runs against.
Lessons
The vibe-check trap
"It works on the examples I tried" is not evaluation - and for non-deterministic systems it is dangerous.
Trace and Read: error analysis
Capture full records, then read traces by hand - qualitative research on real failures.
Analyze: cluster, count, prioritize
Binary pass/fail judgments, ranked by frequency and impact - the spec for your eval tooling.
Codify: build the eval suite
Code-based assertions plus validated LLM-as-judge - an unvalidated judge is just another vibe check.
Enforce: evals in the loop
Wire evals into the development loop so every change gets measured automatically.
Build and run TRACE on your capstone
Trace, read, analyze, codify, and wire evals to run on demand - show a metric moving after a fix.
Lessons in this module