Write Acceptance Criteria as AI Evals

Every AI feature has acceptance criteria written somewhere. They live in the Jira ticket, the PRD, the Notion doc. They say things like "the response must cite a source," "the output must not exceed 200 words," and "the recommended item must be from the active product catalog." These are not soft guidelines. They are testable contracts. The only thing missing is the code that verifies them automatically.

The Translation Pattern

Each acceptance criterion maps to a single eval function. The function takes one response as input and returns a boolean: did this response pass the criterion? That's it. No frameworks, no infrastructure, no complex setup. Just a Python function per criterion.

Here is the translation pattern in plain form: take the criterion statement, identify what can be measured programmatically or with a second LLM call, and write a function that returns True for passing and False for failing.

Prompt

"Convert each of the following acceptance criteria into a single Python function. Each function should take a response string as input and return True if the criterion is met, False otherwise. Use only the standard library and simple string operations where possible. If a criterion requires semantic understanding, describe a judge prompt that could be used instead."

Writing Eval Functions in Python

Here is a set of eval functions translated directly from typical acceptance criteria:

import re
import json

# AC: "The response must not exceed 200 words."
def check_word_count(response: str, max_words: int = 200) -> bool:
    return len(response.split()) <= max_words

# AC: "The response must include a citation in [Source: ...] format."
def check_has_citation(response: str) -> bool:
    return bool(re.search(r'[Source:', response))

# AC: "The output must be valid JSON with a 'product_id' field."
def check_json_schema(response: str) -> bool:
    try:
        data = json.loads(response)
        return "product_id" in data
    except (json.JSONDecodeError, TypeError):
        return False

# AC: "The response must not include pricing information."
def check_no_pricing(response: str) -> bool:
    pricing_patterns = [r'$d+', r'd+s*(USD|EUR|GBP)', r'price', r'costs?']
    return not any(re.search(p, response, re.IGNORECASE) for p in pricing_patterns)

EVAL_SUITE = [check_word_count, check_has_citation, check_json_schema, check_no_pricing]

Running and Reporting Your Eval Suite

With functions in hand, running the suite against a test set is straightforward. The report shows, per criterion, how many responses passed. This is exactly the format a PM can read and act on.

def run_eval_suite(responses: list, suite: list) -> dict:
    report = {fn.__name__: {"passed": 0, "failed": 0} for fn in suite}

    for response in responses:
        for fn in suite:
            if fn(response):
                report[fn.__name__]["passed"] += 1
            else:
                report[fn.__name__]["failed"] += 1

    total = len(responses)
    for name in report:
        p = report[name]["passed"]
        report[name]["pass_rate"] = round(p / total, 3) if total > 0 else 0.0

    return report

# Run it
responses = generate_test_responses()  # your function
results = run_eval_suite(responses, EVAL_SUITE)
for name, data in results.items():
    print(f"{name}: {data['pass_rate']:.0%} ({data['passed']}/{len(responses)})")

Keep this script in your repo next to the feature it tests. Run it in CI. When a new acceptance criterion is written, translate it into a function before closing the ticket. Within a few sprints, you will have a suite that automatically enforces every product guarantee your team has ever committed to.

Want to build this live with Aki?

Join a Lightning Lesson and go deeper on this topic. Browse upcoming sessions →

Write Acceptance Criteria as AI Evals

Key Takeaways

The Translation Pattern

Writing Eval Functions in Python

Running and Reporting Your Eval Suite

Want to build this live with Aki?

Aki Wijesundara

Ready to Launch Your AI Career?

Table of Contents

Share Article

Get Weekly AI Career Tips