Most AI agent security problems are not discovered during development. They are discovered in production, usually after something has already gone wrong.
A developer builds a customer service agent. It can look up orders, update accounts, send emails, escalate tickets. Tests pass. It ships. Two weeks later, a misinterpreted instruction triggers a bulk update across every account in the database. Thousands of records corrupted. The agent did nothing wrong. It did exactly what it was designed to do.
The question that was never asked: what is the worst thing this agent can do if something goes wrong?
That is the question the Vaultak AI Agent Risk Scanner answers. This post explains how it works, what the five risk dimensions measure, and how to act on your score.
There is a consistent gap in how AI agents get shipped to production. Developers test for functional correctness: does the agent do what it is supposed to do? They rarely test for risk exposure: what is the worst-case outcome if something goes wrong?
These are different questions, and they require different tooling. The scanner is that tooling. Describe your agent in plain English, select the capabilities that apply, and get a composite risk score in 10 seconds with specific remediation recommendations.
No account. No credit card. Free at vaultak.com/scan.
This dimension evaluates what the agent can actually do. Read-only operations score low. Write operations, delete operations, code execution, and outbound messaging score progressively higher. An agent that reads documents is categorically different from one that executes shell commands.
An agent's action capabilities and the data it touches are separate concerns. An agent with narrow actions can still score high on resource sensitivity if it is pointed at production databases, API credentials, PII, payment records, or health data. This dimension isolates that exposure.
Blast radius asks: if this agent makes a mistake, how many records or systems are affected? An agent that processes one record at a time has a small blast radius. An agent that runs bulk operations across all users has an enormous one.
This is the dimension most frequently missed before production deployment, and the one most commonly implicated in real incidents. The account corruption scenario described above is a blast radius failure, not a code failure.
This dimension looks for signals that the agent might operate unpredictably: unlimited autonomy, absent confirmation steps, self-modification capabilities, the ability to override its own instructions. High behavioral deviation scores correlate strongly with production surprises.
A continuously running background agent with broad permissions is functionally different from one that runs once when a human explicitly triggers it. Always-on agents compound every other risk dimension. This is easy to overlook and important to isolate.
Navigate to vaultak.com/scan. You will see a text area and a set of capability tags.
In the text area, describe what your agent does in plain English. For example:
A customer service agent that can look up customer orders, update account information, send emails to customers, and escalate issues by creating support tickets.
Then select the capability tags that apply. In this case: write database, send emails, call external APIs. Hit Scan Agent.
The scanner evaluates your agent across the five dimensions and returns a composite score with specific recommendations. For the customer service agent above, a typical result is around 58 out of 100, moderate risk, with recommendations like:
These are not generic. They are tied to your agent's specific capability profile.
| Score | Risk Level | What it typically means |
|---|---|---|
0 - 30 |
Low | Read-only or tightly scoped agent, well-contained |
31 - 60 |
Moderate | Write access or broad data scope, targeted fixes needed |
61 - 80 |
High | Multiple elevated dimensions, guardrails required before production |
81 - 100 |
Critical | Should not be in production without serious controls in place |
The lowest score we have seen in production use is 12, a read-only research agent with no write access and a single human trigger. The highest is 94, an autonomous deployment agent with production database write access, shell execution, and outbound email. That one was flagged and stopped immediately.
The scanner tells you your risk exposure. The Vaultak Core SDK addresses it. Five lines of code and your agent's actions are monitored in real time, policies are enforced at execution, and the rollback engine snapshots state before high-risk actions.
pip install vaultak
from vaultak import VaultakClient
client = VaultakClient(api_key="your_api_key")
# Define a blast radius policy
client.policies.create({
"name": "blast_radius_limit",
"rule": "block",
"condition": {
"action": "update",
"record_count_exceeds": 10
}
})
# Pre-execution check
result = client.check(
action="update",
resource="users.accounts",
payload={"user_id": user_id, "fields": fields}
)
if result.blocked:
raise PolicyViolationError(result.reason)
# Your agent action executes here
If an action is blocked by policy, the agent receives a structured response and continues to the next step. The action never executes. No rollback required.
Vaultak works with LangChain, CrewAI, AutoGPT, LangGraph, and the OpenAI Assistants API. It operates at the action layer, not the model layer, so it is model-agnostic across Claude, GPT-4, Gemini, and others.