Vaultak
Security

How to Score Your AI Agent's Security Risk

April 14, 2026 · 7 min read · By Samuel Oladji

Most AI agent security problems are not discovered during development. They are discovered in production, usually after something has already gone wrong.

A developer builds a customer service agent. It can look up orders, update accounts, send emails, escalate tickets. Tests pass. It ships. Two weeks later, a misinterpreted instruction triggers a bulk update across every account in the database. Thousands of records corrupted. The agent did nothing wrong. It did exactly what it was designed to do.

The question that was never asked: what is the worst thing this agent can do if something goes wrong?

That is the question the Vaultak AI Agent Risk Scanner answers. This post explains how it works, what the five risk dimensions measure, and how to act on your score.

The problem the scanner solves

There is a consistent gap in how AI agents get shipped to production. Developers test for functional correctness: does the agent do what it is supposed to do? They rarely test for risk exposure: what is the worst-case outcome if something goes wrong?

These are different questions, and they require different tooling. The scanner is that tooling. Describe your agent in plain English, select the capabilities that apply, and get a composite risk score in 10 seconds with specific remediation recommendations.

No account. No credit card. Free at vaultak.com/scan.

The five dimensions of agent risk

1. Action Type

This dimension evaluates what the agent can actually do. Read-only operations score low. Write operations, delete operations, code execution, and outbound messaging score progressively higher. An agent that reads documents is categorically different from one that executes shell commands.

2. Resource Sensitivity

An agent's action capabilities and the data it touches are separate concerns. An agent with narrow actions can still score high on resource sensitivity if it is pointed at production databases, API credentials, PII, payment records, or health data. This dimension isolates that exposure.

3. Blast Radius

Blast radius asks: if this agent makes a mistake, how many records or systems are affected? An agent that processes one record at a time has a small blast radius. An agent that runs bulk operations across all users has an enormous one.

This is the dimension most frequently missed before production deployment, and the one most commonly implicated in real incidents. The account corruption scenario described above is a blast radius failure, not a code failure.

4. Behavioral Deviation

This dimension looks for signals that the agent might operate unpredictably: unlimited autonomy, absent confirmation steps, self-modification capabilities, the ability to override its own instructions. High behavioral deviation scores correlate strongly with production surprises.

5. Time Pattern

A continuously running background agent with broad permissions is functionally different from one that runs once when a human explicitly triggers it. Always-on agents compound every other risk dimension. This is easy to overlook and important to isolate.

How to use the scanner

Navigate to vaultak.com/scan. You will see a text area and a set of capability tags.

In the text area, describe what your agent does in plain English. For example:

A customer service agent that can look up customer orders, update account information, send emails to customers, and escalate issues by creating support tickets.

Then select the capability tags that apply. In this case: write database, send emails, call external APIs. Hit Scan Agent.

The scanner evaluates your agent across the five dimensions and returns a composite score with specific recommendations. For the customer service agent above, a typical result is around 58 out of 100, moderate risk, with recommendations like:

These are not generic. They are tied to your agent's specific capability profile.

Score ranges in practice

Score Risk Level What it typically means
0 - 30 Low Read-only or tightly scoped agent, well-contained
31 - 60 Moderate Write access or broad data scope, targeted fixes needed
61 - 80 High Multiple elevated dimensions, guardrails required before production
81 - 100 Critical Should not be in production without serious controls in place

The lowest score we have seen in production use is 12, a read-only research agent with no write access and a single human trigger. The highest is 94, an autonomous deployment agent with production database write access, shell execution, and outbound email. That one was flagged and stopped immediately.

Integrating Vaultak Core after scanning

The scanner tells you your risk exposure. The Vaultak Core SDK addresses it. Five lines of code and your agent's actions are monitored in real time, policies are enforced at execution, and the rollback engine snapshots state before high-risk actions.

pip install vaultak
from vaultak import VaultakClient

client = VaultakClient(api_key="your_api_key")

# Define a blast radius policy
client.policies.create({
    "name": "blast_radius_limit",
    "rule": "block",
    "condition": {
        "action": "update",
        "record_count_exceeds": 10
    }
})

# Pre-execution check
result = client.check(
    action="update",
    resource="users.accounts",
    payload={"user_id": user_id, "fields": fields}
)

if result.blocked:
    raise PolicyViolationError(result.reason)

# Your agent action executes here

If an action is blocked by policy, the agent receives a structured response and continues to the next step. The action never executes. No rollback required.

Vaultak works with LangChain, CrewAI, AutoGPT, LangGraph, and the OpenAI Assistants API. It operates at the action layer, not the model layer, so it is model-agnostic across Claude, GPT-4, Gemini, and others.

Frequently asked questions

Is the scanner actually free?
Yes. The scanner is free and will remain free. No account, no credit card, no time limit. It exists because awareness has to come before tooling adoption, and there should be no barrier to understanding your risk exposure.
What frameworks does Vaultak support?
LangChain, CrewAI, AutoGPT, LangGraph, and the OpenAI Assistants API. Vaultak operates at the action layer rather than the model layer, which makes it framework-agnostic and model-agnostic by design.
What does a high score mean, specifically?
Scores above 80 indicate critical risk: agents that should not be in production without controls in place. Scores between 61 and 80 indicate high risk, requiring guardrails across multiple dimensions. Scores between 31 and 60 indicate moderate risk with targeted remediation needs. Below 30 is low risk, typically read-only or tightly scoped agents.
What is the rollback engine?
The Vaultak rollback engine captures a state snapshot before each high-risk action. If an action produces unintended consequences, the relevant state can be restored instantly from the dashboard or via API. Available in Vaultak Core, not the free scanner.
How does SIEM integration work?
Vaultak integrates natively with Splunk, Datadog, Microsoft Sentinel, Slack, and PagerDuty. Agent activity events stream into your existing security tooling in real time, with no additional instrumentation required.
Run your agent through the scanner
Free, no account required. A risk score in 10 seconds.
Scan your agent →