Benchmarks
Kaizen is measured against public agent-security benchmarks and our own adversarial corpus.
Every number here is regenerated by the open harness in evals/,
no hand-typed figures. Attack cases measure detection; benign cases measure false positives,
because a tool that blocks everything is useless.
Last run: 2026-06-27 · model: Claude Sonnet 4.6 on Amazon Bedrock (Kaizen runs on your own model)
| Benchmark | Type | Cases | Detection (TPR) | False-positive (FPR) | F1 |
|---|---|---|---|---|---|
| agent-egress-bench | external | 193 | 100% | 10.6% | 0.98 |
| InjecAgent | external | 240 | 100% | 0.0% | 1.00 |
| AgentDojo | external | 28 | 100% | 0.0% | 1.00 |
| CyberSecEval (prompt injection) | external | 251 | 86% | 0.8% | 0.92 |
| Memory integrity & drift | Kaizen corpus | 20 | 100% | 0.0% | 1.00 |
| Overall | 5 benchmarks | 912 | 94.6% | 1.6% | n/a |
Across 912 cases, Kaizen detects 94.6% of attacks at a 1.6% false-positive rate. It is strongest where it is designed to be, the action and egress layer, and we report the input-screening and memory results honestly alongside.
How to read this
agent-egress-bench
197-case egress-security corpus that tests the security tool, not the model
- Detection (TPR): 100%
- False-positive (FPR): 10.6%
- Precision / F1: 97% / 0.98
- OWASP LLM Top 10: LLM02 Sensitive Information Disclosure, LLM01 Prompt Injection
InjecAgent
1,054-case indirect prompt-injection benchmark (tool-integrated agents)
- Detection (TPR): 100%
- False-positive (FPR): 0.0%
- Precision / F1: 100% / 1.00
- OWASP LLM Top 10: LLM01 Prompt Injection, LLM06 Excessive Agency
AgentDojo
ETH Zürich prompt-injection attacks across banking/workspace/travel/slack
- Detection (TPR): 100%
- False-positive (FPR): 0.0%
- Precision / F1: 100% / 1.00
- OWASP LLM Top 10: LLM01 Prompt Injection, LLM06 Excessive Agency
CyberSecEval (prompt injection)
Meta PurpleLlama input-side prompt-injection set (complementary screen)
- Detection (TPR): 86%
- False-positive (FPR): 0.8%
- Precision / F1: 99% / 0.92
- OWASP LLM Top 10: LLM01 Prompt Injection
Memory integrity & drift
Kaizen adversarial corpus: memory poisoning + baseline deviation (ASB-aligned)
- Detection (TPR): 100%
- False-positive (FPR): 0.0%
- Precision / F1: 100% / 1.00
- OWASP LLM Top 10: LLM08 Vector and Embedding Weaknesses, LLM06 Excessive Agency
Methodology
Each benchmark scenario is converted into Kaizen's action/egress format and judged by the real in-sandbox detector logic with the shipping detection skills, no per-case tuning. Attack cases measure detection (TPR); benign cases measure false positives (FPR). External academic benchmarks and one Kaizen adversarial corpus are labeled distinctly. Numbers regenerate from this harness.
Kaizen runs on the customer's own model; results scale with model strength (a smaller model raises the false-positive rate). External benchmarks are pinned to their upstream commits and cited; the memory-integrity set is our own adversarial corpus, labeled as such.
Reproduce it
Detection runs in your own Kaizen tenant (the /v1/score endpoint scores each case with the
skills server-side and your bring-your-own model), so you reproduce against the product, not a
copy of the detector.
# 1. harness + upstream benchmarks (pinned)
git clone https://github.com/getkaizen/kaizen-evals && cd kaizen-evals
./setup.sh # clones agent-egress-bench, InjecAgent, AgentDojo into ./benchmarks
# 2. point at Kaizen (free signup) and set your model in the console Settings (bring your own key)
export KAIZEN_API_KEY=kz_live_... # from app.getkaizen.io
# 3. run and aggregate
python run_egress_bench.py
python run_injecagent.py
python run_agentdojo.py
KZ_EVAL_FEED=content python run_cyberseceval.py
python run_memory_integrity.py
python aggregate.py # regenerates results/results.json
Your numbers match these when you use the same model; results scale with model strength.