AI Agent Security: Test What Your Agents Can Reach

TL;DR

Coding agents and MCP integrations often receive far wider permissions than anyone intends.
A prompt instruction like "never delete the database" is not a security control. An agent can write a script that does exactly that.
Work from one assumption: anything an agent can read or touch, it eventually will, even when no one asked it to.
Real incidents already happen: wiped databases, unintended mass emails, and secret keys leaked in client-side code.
Human-led testing of AI agents, MCP servers, and their integrations finds these gaps before an attacker or an accident does.

Teams are connecting AI agents to their operations at speed: coding assistants, MCP servers, and automated workflows with access to a CRM, a code repository, and sometimes a production database. The productivity gain is real, and so is the new attack surface, which almost no one is testing. If you have wired an agent into systems that matter, the question is simple: do you actually know what it can reach, and what happens when it does something you never asked for? That gap is exactly what our AI and agentic systems penetration testing covers, and it sits inside our broader penetration testing services.

Blue stream of JSON log data and request logs, symbolising an AI agent with access to a production database

A prompt is not a permission layer

The most common mistake is treating instructions as guardrails. Telling an agent never to wipe a database does not stop it. If you block delete commands, the agent can still write a script that deletes, then run the script. That is two steps, not zero. The control has to live in the permission model, not in the wording of a prompt. Scoped keys, restricted tokens, and explicit allow-lists are controls. A sentence in a system prompt is a suggestion.

This is the same logic our ethical hackers apply to any system. You do not assume good behaviour. You assume the path exists and test whether it can be walked.

The assume-touch principle

Here is the assumption that prevents the worst outcomes: anything an agent can read or touch, it will. Not because it is malicious, but because agents act on partial context and misread their own task lists. One real example doing the rounds: an agent that misinterpreted a task and sent a discount email to an entire mailing list that was never meant to go out. No one asked it to. It had the access, so it used it.

When you scope an agent, start from that assumption. Every credential it holds, every file it can read, every tool it can call is something it might use at the worst possible moment. If that thought makes you uncomfortable about a specific permission, that discomfort is the finding.

Where the real gaps sit

In practice the gaps cluster in a few places. MCP server permissions that grant far more than the workflow needs. Secret keys and tokens exposed in client-side JavaScript by fast, unvalidated builds. Hooks and validation steps that look like security but can be bypassed in two moves. Test environments treated as safe that are still connected to real data. And classic prompt injection, both direct and indirect, used to smuggle instructions in through content the agent reads. None of these show up if you only read the agent's output. They show up when someone actively tries to abuse the access.

What we test in an agentic SaaS environment

When an AI assistant acts on a user's behalf inside a multi-tenant SaaS platform, the highest-impact issues are about authority, not language. We test tenant isolation, authorization and rights checks inside the assistant, privilege escalation through the AI layer, excessive agency, unauthorized or unintended tool calls, and resource-consumption abuse that could affect platform stability. The principle we verify is straightforward: the AI should only ever act within the permissions of the user invoking it, and that boundary should hold under active attack.

We work from the OWASP Top 10 for LLM Applications and the OWASP Top 10 for Agentic Applications (2026), extended with context-specific attack chains for your architecture. The phases are the same ones we run on any engagement: reconnaissance, scanning, exploitation in both unauthenticated and authenticated contexts, post-exploitation, and reporting with a debrief.

Validation stays human

AI accelerates the work. It does not validate it. A coding agent can reach a high pass rate with good verification checks in place, and far lower without them. It never reaches certainty on its own. The last layer, confirming a finding is real and that a fix actually closes the gap, is human work. That is the line we hold on every engagement: AI can assist the workflow, a human ethical hacker validates the result. No AI-only scanning.

Frequently Asked Questions

What is AI agent penetration testing?

It is a security assessment of the AI agents, models, and integrations your organisation runs, focused on what those agents can access and how that access could be abused. It covers permission scoping, MCP server configuration, secret exposure, tool-call abuse, and controls that can be bypassed.

Is a prompt instruction enough to control an AI agent?

No. A prompt is a suggestion, not a security boundary. An agent can route around an instruction, including by writing and running a script to do something it was told not to do directly. Real controls live in scoped permissions and restricted credentials.

What are the most common AI agent security gaps?

Over-broad MCP server permissions, secret keys leaked in client-side code, controls that can be bypassed in a few steps, test environments still connected to real data, and prompt injection used to smuggle instructions through content the agent reads.

Can you test AI agents in a multi-tenant SaaS platform?

Yes. We focus on tenant isolation, authorization and rights checks inside the assistant, privilege escalation through the AI layer, excessive agency, unauthorized tool calls, and resource-consumption abuse. You can read more on our AI and agentic systems pentest page.

Can AI test its own security?

AI can surface candidate issues and edge cases, which is useful. It cannot reliably confirm whether a finding is real or whether a fix holds. That validation is human work, and it is the core of how we operate.

Related services and resources

This work sits within our penetration testing services, with a dedicated focus on AI and agentic systems. If you want the background on one of the techniques mentioned above, our guide to prompt injection explains how attackers manipulate AI systems in detail. If you are rolling out AI agents across your operations, an assessment now is far cheaper than an incident later.

Your AI Agent Can Reach Your Production Database. Have You Tested That?