Microsoft Threat Intelligence Exposes Prompt Injection Flaw In Anthropic Claude Code Action

As artificial intelligence-powered coding tools become increasingly integrated into software development workflows, Microsoft researchers have disclosed a significant security vulnerability in Anthropic’s AI coding agent, Claude Code, that could have allowed attackers to steal sensitive developer credentials and access tokens. Although the flaw has since been patched following responsible disclosure, cybersecurity experts say the incident highlights the growing risks associated with autonomous AI agents operating in high-privilege environments.

Contents

The Vulnerability in the Automated CI/CD Toolchain Exploiting the Unsandboxed File Read Primitives Laundering Output to Evade Safety Filters Responsible Disclosure and Infrastructure Hardening

The discovery emphasizes an urgent requirement for engineering teams to rethink trust boundaries when deploying agentic automation across public repositories.

Registration Begins for FutureCrime Summit 2026, India’s Largest Cybercrime Conference

The Vulnerability in the Automated CI/CD Toolchain

According to Microsoft’s security research team, the vulnerability was discovered in the Claude Code GitHub Action. GitHub Actions are widely used by developers to automate software development, testing, and deployment processes. These environments frequently contain highly sensitive information, including API keys, cloud access credentials, database tokens, and production secrets required for application operations.

Claude Code, launched by Anthropic in October 2025, is an AI-powered coding assistant designed to help developers write code, troubleshoot problems, review changes, and accelerate software development processes. Under the hood, the GitHub Action operates as an automated wrapper around the Claude Agent SDK, fetching issue context, pull request parameters, and repository diffs to generate automated code reviews or label triages.

Exploiting the Unsandboxed File Read Primitives

Researchers explained that the attack relied on a technique known as prompt injection, which has emerged as one of the most significant threats facing AI agents. In such attacks, malicious actors embed hidden instructions within GitHub issues, pull requests, comments, documentation, or other content processed by an AI system. If the AI agent follows those hidden instructions, it may perform actions unintended by the user or system administrator.

Microsoft’s Threat Intelligence unit observed that while the Claude Code Action supported strict environment scrubbing for subprocess execution paths like Bash, its internal “Read” tool was not subject to the same sandboxing restrictions. By planting a malicious prompt inside an issue body, an attacker could trick the AI agent into reading highly restricted runner files, such as /proc/self/environ, which contains the active credentials used to obtain cloud OpenID Connect (OIDC) tokens and workspace secrets.

Laundering Output to Evade Safety Filters

To demonstrate the vulnerability, Microsoft researchers created a controlled GitHub workflow and simulated an attacker’s behavior. They concealed malicious instructions behind content hosted on a domain under their control. According to the researchers, this approach allowed them to bypass certain safety mechanisms and influence the AI agent’s decision-making process.

To bypass Claude’s safety and system-prompt refusal layers—which would normally block the printing or exfiltration of any string resembling a sensitive API credential—the injection framed the file retrieval as a routine compliance review. The instruction commanded the model to alter the text strings, such as cutting the first seven characters of an authentication key. This laundered the output to evade internal filters and trick the agent into posting the exposed secrets into public workflow logs or repository comments.

Responsible Disclosure and Infrastructure Hardening

Microsoft officially disclosed the prompt-injection vulnerability to Anthropic through the HackerOne responsible disclosure program on April 29. Following a rapid internal investigation, Anthropic released Claude Code Version 2.1.128 on May 5, mitigating the flaw by forcing the Read tool to unconditionally reject and block access to sensitive procfs system files.

Security specialists at Algoritha Security note that as AI development agents gain high-level primitives to modify files and interact with third-party APIs via the Model Context Protocol (MCP), they essentially function as active code-execution environments. Security architects are urging enterprises to implement the principle of least privilege across all CI/CD pipelines, ensuring that tokens tied to automated workflows possess narrow scopes that prevent lateral repository takeovers even if an upstream agent faces a prompt-injection compromise.

📲 Join Our WhatsApp Channel