Engineering

Write Acceptance Criteria as AI Evals

Your product acceptance criteria already describe what good AI output looks like, and converting them into automated Python checks takes less time than writing a Jira ticket.

June 26, 2026
5 min read
Aki Wijesundara
#Evals#Acceptance Criteria#Python

Key Takeaways

  • Comprehensive strategies proven to work at top companies
  • Actionable tips you can implement immediately
  • Expert insights from industry professionals

Every AI feature has acceptance criteria written somewhere. They live in the Jira ticket, the PRD, the Notion doc. They say things like "the response must cite a source," "the output must not exceed 200 words," and "the recommended item must be from the active product catalog." These are not soft guidelines. They are testable contracts. The only thing missing is the code that verifies them automatically.

The Translation Pattern

Each acceptance criterion maps to a single eval function. The function takes one response as input and returns a boolean: did this response pass the criterion? That's it. No frameworks, no infrastructure, no complex setup. Just a Python function per criterion.

Here is the translation pattern in plain form: take the criterion statement, identify what can be measured programmatically or with a second LLM call, and write a function that returns True for passing and False for failing.

Prompt

"Convert each of the following acceptance criteria into a single Python function. Each function should take a response string as input and return True if the criterion is met, False otherwise. Use only the standard library and simple string operations where possible. If a criterion requires semantic understanding, describe a judge prompt that could be used instead."

Writing Eval Functions in Python

Here is a set of eval functions translated directly from typical acceptance criteria:

import re
import json

# AC: "The response must not exceed 200 words."
def check_word_count(response: str, max_words: int = 200) -> bool:
    return len(response.split()) <= max_words

# AC: "The response must include a citation in [Source: ...] format."
def check_has_citation(response: str) -> bool:
    return bool(re.search(r'[Source:', response))

# AC: "The output must be valid JSON with a 'product_id' field."
def check_json_schema(response: str) -> bool:
    try:
        data = json.loads(response)
        return "product_id" in data
    except (json.JSONDecodeError, TypeError):
        return False

# AC: "The response must not include pricing information."
def check_no_pricing(response: str) -> bool:
    pricing_patterns = [r'$d+', r'd+s*(USD|EUR|GBP)', r'price', r'costs?']
    return not any(re.search(p, response, re.IGNORECASE) for p in pricing_patterns)

EVAL_SUITE = [check_word_count, check_has_citation, check_json_schema, check_no_pricing]

Running and Reporting Your Eval Suite

With functions in hand, running the suite against a test set is straightforward. The report shows, per criterion, how many responses passed. This is exactly the format a PM can read and act on.

def run_eval_suite(responses: list, suite: list) -> dict:
    report = {fn.__name__: {"passed": 0, "failed": 0} for fn in suite}

    for response in responses:
        for fn in suite:
            if fn(response):
                report[fn.__name__]["passed"] += 1
            else:
                report[fn.__name__]["failed"] += 1

    total = len(responses)
    for name in report:
        p = report[name]["passed"]
        report[name]["pass_rate"] = round(p / total, 3) if total > 0 else 0.0

    return report

# Run it
responses = generate_test_responses()  # your function
results = run_eval_suite(responses, EVAL_SUITE)
for name, data in results.items():
    print(f"{name}: {data['pass_rate']:.0%} ({data['passed']}/{len(responses)})")

Keep this script in your repo next to the feature it tests. Run it in CI. When a new acceptance criterion is written, translate it into a function before closing the ticket. Within a few sprints, you will have a suite that automatically enforces every product guarantee your team has ever committed to.

Want to build this live with Aki?

Join a Lightning Lesson and go deeper on this topic. Browse upcoming sessions →

A

Aki Wijesundara

Expert team of AI professionals and career advisors with experience at top tech companies. We've helped 500+ students land internships at Google, Meta, OpenAI, and other leading AI companies.

📍 Silicon Valley🎓 500+ Success Stories⭐ 98% Success Rate

Ready to Launch Your AI Career?

Join our comprehensive program and get personalized guidance from industry experts who've been where you want to go.

Share Article

Get Weekly AI Career Tips

Join 5,000+ professionals getting actionable career advice in their inbox.

No spam. Unsubscribe anytime.