Designing AI agents to resist prompt injection

中文日本語 Español

OpenAI Mar 6, 2026

AI agents are vulnerable to prompt injection attacks, which are increasingly sophisticated and resemble social engineering, requiring defenses beyond simple input filtering.

Read Full Article

Summary

AI agents capable of web browsing and action-taking are susceptible to prompt injection attacks, where malicious instructions are embedded in external content. These attacks have evolved from simple prompt overrides to more complex social engineering tactics, making detection difficult. Defending against these attacks requires not only identifying malicious inputs but also designing systems that limit the impact of successful manipulation. The authors advocate for viewing prompt injection through the lens of social engineering risk management, similar to protecting human customer service agents. This involves implementing safeguards like limiting agent capabilities, flagging suspicious activity, and requiring confirmation before potentially dangerous actions, such as transmitting sensitive information. Techniques like 'Safe Url' are used to detect and mitigate unauthorized data transmission. The core principle is to ensure potentially dangerous actions are not performed silently and to emulate the controls a human agent would have in a similar situation.

(Source：OpenAI)

中文日本語 Español

Read Full Article

The Verge Apr 28, 2026

Jury selection in Musk v. Altman: ‘People don’t like him’

The Verge Apr 28, 2026

Google is testing AI chatbot search for YouTube

The Verge Apr 27, 2026

Canonical lays out a plan for AI in Ubuntu Linux

The Verge Apr 27, 2026

Google employees ask Sundar Pichai to say no to classified military AI use

TechCrunch Apr 27, 2026