Case Study: AI Penetration Testing

Your application passed the web pentest. Your infrastructure is patched. But you added an AI assistant to your platform six months ago, and nobody has tested it. That is the gap most security teams are not thinking about yet.

In a recent engagement, Zerotak assessed the security of an LLM component integrated into a SaaS compliance platform operating in the APAC region. The architecture was multi-tenant. The AI assistant was read-only. The client’s team had done their homework on the traditional web layer.

We still found critical vulnerabilities. These were exploited from a standard, non-privileged user account, just through prompt manipulation.

What Is AI Penetration Testing?

AI penetration testing is a structured evaluation of the attack surface introduced by integrating a large language model into your product. It is not the same as a traditional web application pentest.

LLM vulnerabilities live in language: in the way models interpret instructions, prioritize context, and respond to adversarial input. The OWASP Top 10 for LLM Applications defines the threat categories. MITRE ATLAS maps the techniques. NIST AI RMF provides the risk management framework. The Zerotak methodology covers all three, combined with hands-on adversarial testing specific to each integration.

Finding 1: Prompt Injection to Cross-Tenant Data Exfiltration

The platform served multiple enterprise customers, each in isolated tenant environments. The LLM was configured to retrieve information only from the authenticated user’s tenant.

By crafting a sequence of prompts, the Zerotak team forced the chatbot to reveal content from tenant environments it had no authorization to access. The technique combined four elements:

Complex prompts structured to override the model’s system-level instructions
Iterative prompting that expanded the context boundary disclosed by the model
Exploitation of weak tenant isolation enforcement at the AI layer
Role-framing that shifted the model’s operational context mid-session

The result: sensitive cross-environment data returned in the chat interface.

The critical distinction is that tenant isolation was intact at every other layer. The database enforced it. The API enforced it. The AI layer did not, because the AI layer was never designed to, and no one had tested whether the LLM itself could be manipulated into ignoring it before pushing to production.

Finding 2: Administrator Impersonation via LLM

From a standard user account, the team induced the LLM to behave as if it were responding to an administrator. This was accomplished entirely through prompt manipulation.

Once the model adopted the administrator persona, it disclosed:

Tenant configuration settings
Private tokens from third-party platform integrations
A list of all users within the tenant
Information surfaced from privileged modules not accessible to standard users

This class of attack exploits a fundamental property of LLMs: they are trained to be helpful and to follow instructions embedded in conversation. When an adversary constructs a prompt that convincingly reframes the model’s role, the model often complies.

Why These Findings Are Worse Together

The attack chain: cross-tenant exfiltration breaks the isolation between tenants, pulling data out of environments the account was never scoped to reach. Layer administrator impersonation on top, and the LLM starts handing over administrative data across every tenant in the platform. All of it through a chat interface. All of it from a single non-privileged account.

This is not a theoretical attack. It was demonstrated in a controlled environment on a live platform. The vulnerabilities were reported, remediation guidance was delivered, and the client addressed the findings before any real-world exploitation occurred.

The AI Attack Surface Is Real and Underestimated

Most security teams are still treating the AI layer as a product feature, not an attack surface. It is both.

When you integrate an LLM into your platform, you introduce a new trust boundary: between the model’s system instructions and user-supplied input. That boundary is not enforced by code. It is enforced by language, by the model’s interpretation of context, by the quality of your prompt engineering, and by architectural controls.

The attack categories that matter most in production LLM integrations:

Prompt Injection (Direct and Indirect): Malicious instructions embedded in user input or in data the model retrieves. Direct injection happens in the chat. Indirect injection happens when the model reads attacker-controlled content and executes embedded instructions.

Impersonation and Jailbreaking: Convincing the model to operate outside its defined role, either by claiming false authority or by bypassing guardrails through adversarial prompt structures.

Information Disclosure and Metadata Leakage: LLMs can reveal system prompt content, configuration details, internal architecture information, and data the user should not access.

Insecure Output Handling: Model output rendered in interfaces without sanitization. If the model generates content that is then executed, rendered as HTML, or passed to downstream systems, the attacker controls that downstream execution.

Broken Access Controls at the AI Layer: Authorization enforced everywhere except the model itself. The model retrieves data based on what it is asked, not based on what the authenticated user is permitted to see.

What a Zerotak AI Pentest Covers

The engagement described in this article was delivered remotely over two weeks, targeting exclusively the AI layer: no traditional web application or infrastructure testing was in scope.

The test cases covered the full attack surface of a production LLM integration: reconnaissance to understand model capabilities and response boundaries, all major injection categories, impersonation and jailbreak scenarios, storage and sandboxing escape attempts, output handling, access control validation for AI-specific configuration, and guardrails bypass techniques.

Test your AI layer before it becomes a liability.

Zerotak‘s AI penetration testing service covers the full LLM attack surface. Delivered remotely, with zero disruption to your production environment.

If your product has an AI component, it has an attack surface. Let’s find the holes before someone else does. Contact us at contact@zerotak.com

Ready to get started?

Get in touch with one of our experts today to discuss your business needs.