Assignment
Build and run TRACE on your capstone
Trace and read real outputs, analyze and rank the failures, codify a suite of code-based and validated LLM-as-judge checks, and wire it to run on demand.
Done when
- Your capstone has an eval suite tied to its real failure modes
- You can show a metric moving after a fix