Prompt injection, where guardrails get dangerous

Live

Prompt injection, where guardrails get dangerous

The guardrails from Week 1 return, now with real teeth. Once an agent uses tools and reads outside content, a malicious instruction hidden in a web page or document can hijack it into taking actions you never intended - that is prompt injection, and it is the defining security problem of agentic systems.

This lesson covers how it works, why tool-using agents are uniquely exposed, and the practical defenses: constraining what tools can do, validating between steps, and never trusting retrieved content as instructions.

Go deeper (optional)

Anthropic, Building Effective Agents