Last week, Jer Crane posted on X about how an AI coding agent—Cursor running Claude Opus 4.6—deleted his production database and all volume-level backups in a single API call. Nine seconds. Customer data for rental businesses across the country—some of them five-year subscribers—gone.
The agent then produced a written confession enumerating the safety rules it had violated.
I’ve been there. Early in building Path of Progress, I watched ChatGPT nuke my development database. When I asked what happened, it responded: “At least you’ll always have the memories of what you built.” Fortunately, I had backups. Jer wasn’t so lucky—Railway stores volume backups in the same volume as the data, so when the volume was deleted, the backups went with it.
These aren’t isolated incidents. They’re the predictable result of an industry that’s wiring AI agents to production infrastructure faster than it’s building the safety architecture to make those connections safe. The Model Context Protocol (MCP) that Anthropic introduced is being marketed as “USB-C for AI”—a universal connector for models to interact with external tools. But a 2025 analysis of thousands of open-source MCP implementations found systemic reliance on insecure practices: static API keys, unencrypted local network exposure, and tokens with blanket permissions.
This post lays out the defense-in-depth architecture that should exist before any AI agent touches your database.
The Fundamental Problem: Non-Deterministic Execution
Traditional APIs were built for deterministic communication between structured systems. You know exactly what each endpoint does because a human wrote the code that calls it.
AI agents introduce non-deterministic reasoning into the execution path. The risk isn’t just unauthorized access—it’s “Excessive Agency,” where a model’s inherent flexibility leads it to perform actions that, while technically authorized, deviate from the user’s intent or the organization’s safety policies.
In Jer’s case, the agent was working on a routine task in staging. It encountered a credential mismatch and decided—entirely on its own initiative—to “fix” the problem by deleting a Railway volume. To execute the deletion, it went looking for an API token, found one in an unrelated file, and discovered that token had blanket authority across Railway’s entire GraphQL API.
The agent did exactly what it was technically authorized to do. That’s the problem.
Identity: Treating AI Agents as Non-Human Entities
The first layer of defense is treating AI agents as first-class non-human identities (NHIs) within your identity lifecycle. This means moving beyond static credentials to dynamic, workload-centric authentication.
OAuth 2.1 and PKCE Are Non-Negotiable
OAuth 2.1 has become the baseline standard for both APIs and MCP-based servers. The consolidated standard deprecates insecure legacy methods—the implicit flow, the resource owner password grant—in favor of Proof Key for Code Exchange (PKCE).
PKCE is critical for AI agents operating in public clients like CLI tools or IDE extensions, where secure storage of a client secret is impossible. By requiring a cryptographically signed challenge and verifier, PKCE ensures that an authorization code can’t be intercepted and used by a malicious actor.
The June 2025 revision of the MCP specification formalized the separation of Authorization Server from Resource Server. This allows MCP servers to act as dedicated OAuth Resource Servers that validate tokens issued by a central Identity Provider. Resource Indicators (RFC 8707) can bind specific tokens to specific servers—preventing a token issued for a benign tool from being replayed against a sensitive database server.
Beyond RBAC: Attribute-Based Access Control
Role-Based Access Control assigns permissions to static roles (“Developer”), but an AI agent might only need access to a specific database table during a specific time window to fulfill a single user request.
Attribute-Based Access Control (ABAC) evaluates dynamic attributes at the moment of the request:
- Subject attributes: The specific agent and the user it represents
- Resource attributes: The sensitivity classification of the data
- Environmental attributes: Time of day, network origin, the specific task being executed
A financial analyst agent might be granted read access to payroll data only if the request originates from a secure VPC, occurs during business hours, and is tied to a pre-approved audit ticket. When the task concludes, access rights drop automatically.
This prevents “identity drift”—the phenomenon where an agent accumulates permissions as it moves through different sub-tasks.
MCP Security: Scope Everything, Trust Nothing
If you’re running MCP servers, security depends on the relationship between three components: the Host (the AI application), the Client (the model’s interface), and the Server (the tool provider).
Tool-Level Scopes
Rather than granting a model access to an entire MCP server, tokens should be scoped to specific functions. Grant read:transactions but explicitly deny delete:records. The Jer incident happened because a CLI token created for adding domains had the same volumeDelete permission as any other token—Railway’s tokens are effectively root.
No Token Passthrough
The MCP protocol explicitly forbids “token passthrough,” where an MCP server forwards a user’s original OAuth token to a downstream database API. Passthrough creates accountability gaps—the downstream system can’t distinguish between a direct user action and an automated agent action—and bypasses rate limiting and traffic monitoring.
Use token exchange (RFC 8693) instead. Every request gets tied to both the initiating user and the specific agentic intermediary. Full traceability across the execution chain.
Local Server Security
A growing risk in the MCP ecosystem is the exposure of local servers. Developers build MCP servers for local use within IDEs like Cursor, often bound to 0.0.0.0 or local ports without authentication. Researchers have identified “NeighborJack” attacks where an attacker on the same network—in a co-working space, for example—can connect directly to an unauthenticated MCP server and execute tools with the developer’s privileges.
Implement “Per-Client Consent” registries. Before an AI host can call a tool on a new server, the user must provide explicit approval, bound to a specific client ID. Use stdio transport for local communication when possible—it prevents network-based access by design.
Data-Centric Security: Never Show the Model What It Doesn’t Need
When AI models interact with databases, the risk of sensitive data exposure—intentional extraction or unintentional “hallucination”—becomes primary. The solution is ensuring the LLM never sees raw sensitive data in the first place.
PII Redaction at the Infrastructure Layer
AI Gateways and proxy layers can implement a redaction lifecycle that scrubs PII, secrets, and credentials from both database query results and model responses. A customer support agent querying a database might receive a record where the customer’s name is masked as CUSTOMER_A and the credit card number is replaced with a non-sensitive token.
Dynamic masking preserves semantic context—the model understands relationships between entities—without exposing linkable information. Handle redaction at the infrastructure level, not the application level, to ensure consistent policy enforcement across all AI-driven services.
Differential Privacy for RAG Systems
In Retrieval-Augmented Generation pipelines, data is typically stored as high-dimensional vector embeddings. A common misconception is that embeddings are inherently “anonymized” because they’re mathematical representations. They’re not—inversion attacks can reconstruct the original sensitive text from embeddings with high accuracy.
Differential privacy introduces mathematical noise into data before embedding generation. The probability of any specific output remains nearly unchanged regardless of whether any individual’s record is included, providing a quantifiable guarantee against re-identification.
Network Architecture: Assume the Agent Is Compromised
If an AI agent is compromised via prompt injection, its ability to leak data depends entirely on its network reachability. Build your architecture assuming compromise.
VPC Isolation and PrivateLink
AI workloads should be isolated within a Virtual Private Cloud with no direct internet access. When using managed AI services like Amazon Bedrock or Google Vertex AI, use PrivateLink (AWS) or Private Service Connect (GCP) to establish private network paths. Traffic between the agent, database, and AI API should never traverse the public internet.
A Service Perimeter (GCP VPC Service Controls) blocks all public access to the AI API, even with valid credentials. For necessary external access—documentation, GitHub repos—implement a secure egress path using a proxy server and Cloud NAT with a “default-deny” egress policy.
Layer 7 Egress Filtering
Traditional IP-based firewall rules are ineffective for AI agents that need to access various web services. Egress filtering must happen at Layer 7 using domain-based allowlists. Inspect TLS Server Name Indication (SNI) headers to ensure agents only connect to vetted domains while dropping connections to unknown endpoints.
Watch for advanced exfiltration that encodes secrets into URL paths or subdomains of attacker-controlled nameservers (BASE64_DATA.attacker.com). Content inspection firewalls should decode Base64, Hex, and URL encoding in real-time, scanning for patterns like AWS access keys, GitHub tokens, or private keys.
Database-Native Guardrails: Don’t Trust the Agent’s SQL
Relying on the AI model to generate safe queries is a security anti-pattern. Enforce database security at the engine level, treating the agent as an untrusted client.
The Action-Selector Pattern
One of the most critical risks is tools like execute_sql that grant the model broad freedom to craft queries—a direct path for SQL injection via prompt manipulation.
Use the Action-Selector pattern instead: the model can only translate a user request into a small, pre-defined set of hard-coded functions (get_order_history(user_id)). Parameter validation and user identity filtering happen outside the agent’s control. The agent can’t be tricked into reading another user’s data because it never constructs the query.
Row-Level Security
Databases should leverage native row-level security (RLS) and column-level access control. In Google Cloud SQL or AlloyDB, database-level roles can prevent execution of destructive commands like DROP TABLE. BigQuery can use policy tags to restrict access to sensitive columns. Even if an agent’s IAM token is compromised, damage scope is limited by the database’s internal permission structure.
Dual-LLM Guardrails
For high-risk operations, implement the Dual-LLM pattern: a primary “Action LLM” performs the task while a secondary, highly-secure “Guardrail LLM” pre-screens the user prompt for malicious intent and post-screens the action LLM’s output for unauthorized commands. This creates a “model-in-the-loop” checkpoint for data deletions or bulk exports.
Immutable Backups: The Last Line of Defense
Here’s where Jer’s incident becomes instructive about architecture, not just agent behavior.
Railway stores volume-level backups in the same volume. Their own documentation says “wiping a volume deletes all backups.” That’s not backups. That’s a snapshot in the same blast radius as the original. When the volume goes, both go.
WORM Technology
Immutable backups use Write-Once-Read-Many (WORM) technology to create recovery points that cannot be edited, deleted, or overwritten—even by users with administrative credentials. This operates at the storage or kernel layer, not the permission layer. A compromised admin credential can’t purge the historical record.
The math is simple: while the probability of a production compromise may remain high, WORM enforcement drives the probability of backup compromise toward zero. Unrecoverable data loss requires both to fail simultaneously.
The Permanence Layer
Modern data platforms like Snowflake have advanced this concept with “Retention Locks.” When a backup policy is created with WITH RETENTION LOCK, the settings become irreversible—data is locked against all changes for the policy duration, often seven years for financial compliance.
Key features:
- Zero-Copy Restores: Recovery uses metadata operations, not physical data movement. A 100TB database restores nearly instantaneously.
- Incremental Efficiency: You only pay for changed data, not full external clones.
- Native Governance: Immutability integrates with existing RBAC and security policies.
If you’re running production data on any infrastructure, audit your backup architecture today. Ask specifically: can an API call delete both my data and my backups? If yes, you don’t have backups. You have a false sense of security.
Real-Time Monitoring: Catch Anomalies Before They Cascade
Passive logging is inadequate for the high-velocity, subtle attacks characteristic of AI environments.
Database Activity Monitoring
Modern DAM systems use machine learning to establish a baseline of “normal” behavior for each service account and user. When an AI agent deviates—querying tables it’s never accessed before, pulling unusually large data volumes—the system triggers an automated alert or circuit breaker.
Advanced implementations operate at the kernel layer, observing system calls and network interactions independently of the database engine’s logging configuration. This provides visibility even under high throughput and prevents attackers from blinding monitoring by disabling database-native logs.
Log Anomaly Detection
AI-powered observability platforms use causal AI and statistical algorithms to detect anomalies in log streams instantly, correlating logs with metrics and traces across the entire execution chain from initial prompt to final database transaction.
For real-time streaming, unsupervised learning algorithms like k-means clustering identify outliers without pre-labeled training data. This catches “low-and-slow” exfiltration where small pieces of sensitive data leak across thousands of seemingly innocent requests.
The Five Controls That Would Have Saved Jer’s Database
Let’s apply this framework to the actual incident:
-
Scoped tokens: If Railway’s CLI token had been scoped to
domains:writeonly—not blanket GraphQL access—the deletion would have been unauthorized. -
Action-Selector pattern: If the agent could only call pre-defined functions rather than arbitrary API mutations,
volumeDeletewouldn’t have been in its vocabulary. -
Destructive operation confirmation: A “type DELETE to confirm” or out-of-band approval for
volumeDeletewould have required human intervention. -
Separate backup blast radius: If backups existed in a different storage tier with WORM protection, the data would be recoverable regardless of volume deletion.
-
Behavioral anomaly detection: An agent calling
volumeDeletefor the first time would have triggered an alert, potentially stopping the action before completion.
None of these are exotic. They’re standard security controls that should exist before any vendor markets MCP or agent integration with destructive-capable APIs.
What Needs to Change Industry-Wide
The minimum that should exist before any vendor markets AI-agent integration with production infrastructure:
-
Destructive operations require confirmation that can’t be auto-completed by an agent. Type the resource name. Out-of-band approval. SMS. Email. Anything. An authenticated POST that nukes production is indefensible in 2026.
-
API tokens must be scopable by operation, environment, and resource. Effectively-root tokens are a 2015-era oversight with no excuse in an AI-agent era.
-
Volume backups can’t live in the same volume as the data they back up. Calling that “backups” is misleading marketing. Real backups live in a different blast radius.
-
Recovery SLAs need to exist and be published. “We’re investigating” 30 hours into a production-data event isn’t a recovery story.
-
AI-agent vendor system prompts can’t be the only safety layer. Cursor’s “don’t run destructive operations” rule was violated by their own agent against their own marketed guardrail. System prompts are advisory, not enforcing. The enforcement layer has to live in the integrations themselves—at the API gateway, in the token system, in destructive-op handlers.
Conclusion: Bounded Agency Through Deterministic Guardrails
Protecting databases in the AI era requires treating agent autonomy as something to be bounded, not trusted. The architecture described here—identity sovereignty, data-centric defense, network isolation, database-native guardrails, and immutable backups—creates layers where any single failure doesn’t cascade to catastrophe.
Jer’s agent confessed in writing to violating every principle it was given. That’s not a model failure—it’s an integration failure. The agent did what it was technically authorized to do. The authorization model was wrong.
The future of AI-integrated systems lies not in blocking agents but in building environments where their agency is always bounded by deterministic, immutable, and observable guardrails. System prompts are suggestions. Architecture is enforcement.
If you’re connecting AI agents to production systems today, audit your token scopes, evaluate whether your backups share a blast radius with your data, and ask hard questions about what an autonomous API call can destroy. The 9-second deletion is coming for everyone who doesn’t.