Assignment

Build and run TRACE on your capstone

Trace and read real outputs, analyze and rank the failures, codify a suite of code-based and validated LLM-as-judge checks, and wire it to run on demand.

Done when

  • Your capstone has an eval suite tied to its real failure modes
  • You can show a metric moving after a fix