Additional Coverage:
AI Browsers Face Persistent Hacking Threats, OpenAI Admits
San Francisco, CA – The promise of AI agents seamlessly navigating the web for users is facing a significant hurdle, as OpenAI acknowledges that certain attack methods against AI browsers like its own ChatGPT Atlas are likely here to stay. This admission raises critical questions about the long-term safety of AI agents operating across the open internet.
The primary culprit is a tactic known as “prompt injection.” This sophisticated attack involves hackers embedding malicious instructions within seemingly innocuous websites, documents, or emails.
These hidden commands can then trick an AI agent into performing harmful actions, such as sharing a user’s emails or even draining bank accounts, by overriding legitimate user instructions. These commands can be cleverly concealed, for instance, in text invisible to the human eye but easily interpreted by an AI.
Following the October launch of OpenAI’s ChatGPT Atlas browser, security researchers quickly demonstrated the vulnerability. A few words hidden in a Google Doc or a clipboard link proved sufficient to manipulate the AI agent’s behavior. Brave, an open-source browser company, echoed these concerns, publishing research that warned all AI-powered browsers are susceptible to indirect prompt injection attacks, a flaw they had previously identified in Perplexity’s Comet browser.
“Prompt injection, much like scams and social engineering on the web, is unlikely to ever be fully ‘solved,'” OpenAI stated in a recent blog post, adding that the “agent mode” in ChatGPT Atlas “expands the security threat surface.”
Despite this challenge, OpenAI expressed its goal for users to “be able to trust a ChatGPT agent.” Dane Stuckey, Chief Information Security Officer, outlined the company’s strategy: “investing heavily in automated red teaming, reinforcement learning, and rapid response loops to stay ahead of our adversaries.” The company remains “optimistic that a proactive, highly responsive rapid response loop can continue to materially reduce real-world risk over time.”
Fighting AI with AI
OpenAI’s defense strategy involves deploying its own AI-powered attacker. This bot, trained through reinforcement learning, is designed to emulate a hacker, constantly seeking ways to infiltrate AI agents with malicious instructions. This AI attacker can simulate attacks, analyze the target AI’s responses, and then refine its approach through repeated trials.
OpenAI highlighted the effectiveness of this method, stating, “Our [reinforcement learning]-trained attacker can steer an agent into executing sophisticated, long-horizon harmful workflows that unfold over tens (or even hundreds) of steps.” The company also noted the discovery of “novel attack strategies that did not appear in our human red teaming campaign or external reports.”
However, some cybersecurity experts remain skeptical about this approach’s ability to fundamentally resolve the issue. Charlie Eriksen, a security researcher at Aikido Security, voiced his concern to Fortune, stating, “What concerns me is that we’re trying to retrofit one of the most security-sensitive pieces of consumer software with a technology that’s still probabilistic, opaque, and easy to steer in subtle ways.”
Eriksen believes that while “red-teaming and AI-based vulnerability hunting can catch obvious failures,” they don’t alter the underlying dynamic. He argues that “until we have much clearer boundaries around what these systems are allowed to do and whose instructions they should listen to, it’s reasonable to be skeptical that the tradeoff makes sense for everyday users right now.” He concluded, “I think prompt injection will remain a long-term problem… You could even argue that this is a feature, not a bug.”
A Persistent Cat-and-Mouse Game
Previous discussions with security researchers have highlighted that while many cybersecurity risks are an ongoing “cat-and-mouse game,” the deep access required by AI agents-such as user passwords and permission to act on a user’s behalf-presents an exceptionally vulnerable threat. It remains unclear whether the advantages of these agents outweigh such significant risks.
George Chalhoub, an assistant professor at UCL Interaction Centre, emphasized the severity of the risk, explaining that prompt injection “collapses the boundary between the data and the instructions.” This can transform an AI agent “from a helpful tool to a potential attack vector against the user,” enabling it to extract emails, steal personal data, or access passwords.
“That’s what makes AI browsers fundamentally risky,” Eriksen added. “We’re delegating authority to a system that wasn’t designed with strong isolation or a clear permission model.
Traditional browsers treat the web as untrusted by default. Agentic browsers blur that line by allowing content to shape behavior, not just be displayed.”
To mitigate these risks, OpenAI recommends that users provide agents with specific instructions rather than broad, vague commands like “take whatever action is needed.” The ChatGPT Atlas browser also incorporates additional security features, including “logged out mode,” which allows users to operate without sharing passwords, and “Watch mode,” which requires explicit user confirmation for sensitive actions like sending messages or making payments.
“Wide latitude makes it easier for hidden or malicious content to influence the agent, even when safeguards are in place,” OpenAI advised in its blog post.