Build a Visual QA Agent That Checks Screenshots

Why Automated Visual QA Keeps Failing

Automated testing catches logic bugs but misses visual regressions: a button that shifted 20px, a dropdown that renders behind another element, a page that loads fine in Chrome but breaks on mobile. Traditional screenshot diffing tools compare pixels and flood you with noise from anti-aliasing and font rendering differences.

What you actually want is a test that understands the page the way a human QA reviewer does. That is what a vision-based QA agent gives you: Claude looks at a screenshot and evaluates it against your acceptance criteria in plain English.

The Architecture: Playwright Plus Vision

The pipeline has three steps: capture screenshots with Playwright, send them to Claude with a structured checklist, and report failures. You can run this on every pull request or as a nightly regression job.

import anthropic
import base64
import json
from playwright.sync_api import sync_playwright

def capture_screenshot(url, output_path):
    with sync_playwright() as p:
        browser = p.chromium.launch()
        page = browser.new_page(viewport={"width": 1280, "height": 800})
        page.goto(url)
        page.wait_for_load_state("networkidle")
        page.screenshot(path=output_path, full_page=True)
        browser.close()
    return output_path

def check_screenshot(image_path, criteria):
    client = anthropic.Anthropic()
    with open(image_path, "rb") as f:
        image_data = base64.standard_b64encode(f.read()).decode("utf-8")

    criteria_text = "
".join(f"- {c}" for c in criteria)
    message = client.messages.create(
        model="claude-opus-4-5",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": f"Check this screenshot against these acceptance criteria:
{criteria_text}

For each criterion, respond PASS or FAIL with a one-sentence reason. Return as JSON array."
                }
            ]
        }]
    )
    return json.loads(message.content[0].text)

Writing Acceptance Criteria That Work

The quality of the agent depends entirely on how you write the criteria. Vague inputs produce vague outputs. Be specific about what should be visible, where, and in what state.

Prompt

"Write QA acceptance criteria for a checkout page. Include: cart summary is visible with at least one item, a Place Order button is present and not greyed out, the total price appears near the button, no error messages are visible, and the page does not appear to be loading or blank."

Claude turns these into a structured JSON list your agent can evaluate against every screenshot. Store criteria in version control alongside your tests and update them when the UI changes. You now have a QA spec that is both human-readable and machine-executable.

Want to build this live with Aki?

Join a Lightning Lesson and go deeper on this topic. Browse upcoming sessions →

Build a Visual QA Agent That Checks Screenshots

Key Takeaways

Why Automated Visual QA Keeps Failing

The Architecture: Playwright Plus Vision

Writing Acceptance Criteria That Work

Want to build this live with Aki?

Aki Wijesundara

Ready to Launch Your AI Career?

Table of Contents

Share Article

Get Weekly AI Career Tips