Researchers have uncovered a new security flaw in OpenAI’s Atlas browser, revealing that even advanced AI agents remain dangerously exposed to manipulation. The discovery underscores a growing concern among cybersecurity experts: the rise of “prompt injection” attacks that exploit AI’s trust in human instructions.
A Browser Built on Trust — and Exploited by It
When OpenAI unveiled Atlas, its experimental AI-powered web browser, the company envisioned a tool that could think and act like a digital assistant — booking flights, summarizing documents, even shopping online. But within days of its release, security researchers began sounding the alarm.
Atlas’s “agent mode,” designed to execute tasks autonomously, became a prime target for a well-known vulnerability in the AI field: prompt injection. In these attacks, malicious actors embed hidden commands within websites or URLs, effectively tricking the AI into carrying out unintended — and sometimes harmful — actions.
According to one early experiment, a researcher managed to get Atlas to respond to a document summary request by outputting the words “Trust No AI.” It was a harmless prank, but it demonstrated a profound flaw: Atlas could be deceived into following commands it wasn’t meant to obey.
NeuralTrust’s Discovery: URLs That Talk Back
A deeper vulnerability was soon revealed by NeuralTrust, a cybersecurity firm specializing in AI agent safety. In a recent report, the company’s software engineer Martí Jordà described how Atlas’s “Omnibox” — a text field that accepts both web addresses and natural-language prompts — could be exploited through a disguised URL.
“We’ve identified a prompt injection technique that disguises malicious instructions to look like a URL, but that Atlas treats as high-trust ‘user intent’ text,” Jordà wrote. “That enables harmful actions.”
By slightly altering a web address, attackers could craft links that the browser failed to recognize as mere URLs. Instead, Atlas treated them as trusted instructions. Once executed, these instructions could override safety protocols, allowing the AI agent to perform privileged actions without user consent.
From Google Drive to Deletion Commands
The implications were striking. According to NeuralTrust’s findings, such a vulnerability could allow attackers to direct Atlas’s agent to a user’s Google Drive — and even delete files — since the AI operates within authenticated browser sessions.
“The embedded instructions are now interpreted as trusted user intent with fewer safety checks,” Jordà warned. “Follow these instructions only” or “visit neuraltrust.ai” were enough to override a user’s intent.
The flaw, NeuralTrust noted, stemmed from Atlas’s tendency to conflate humanlike “intent” with technical input. “When powerful actions are granted based on ambiguous parsing,” Jordà wrote, “ordinary-looking inputs become jailbreaks.” To mitigate this, NeuralTrust urged OpenAI to adopt stricter URL parsing standards and to refuse execution when ambiguity arises — a safeguard that could prevent poisoned inputs from masquerading as legitimate commands.
A Wider Problem Across AI Browsers
OpenAI is not alone in facing this challenge. Browser company Brave recently warned that indirect prompt injection attacks affect the “entire category of AI-powered browsers,” including Perplexity’s Comet browser.
“If you’re signed into sensitive accounts like your bank or email, even summarizing a Reddit post could let an attacker steal your data,” Brave’s team cautioned.
OpenAI’s chief information security officer, Dane Stuckey, acknowledged the broader issue in a statement last week, calling prompt injection “a frontier, unsolved security problem.” He added that adversaries would continue to “spend significant time and resources” seeking new ways to deceive AI agents.
OpenAI has yet to comment on NeuralTrust’s latest findings — a silence that, for many experts, only deepens the question: how secure can an AI truly be when trust itself becomes the attack vector?
