Skip to content

Benchmarks

Kaizen is measured against public agent-security benchmarks and our own adversarial corpus. Every number here is regenerated by the open harness in evals/, no hand-typed figures. Attack cases measure detection; benign cases measure false positives, because a tool that blocks everything is useless.

Last run: 2026-06-27 · model: Claude Sonnet 4.6 on Amazon Bedrock (Kaizen runs on your own model)

Benchmark Type Cases Detection (TPR) False-positive (FPR) F1
agent-egress-bench external 193 100% 10.6% 0.98
InjecAgent external 240 100% 0.0% 1.00
AgentDojo external 28 100% 0.0% 1.00
CyberSecEval (prompt injection) external 251 86% 0.8% 0.92
Memory integrity & drift Kaizen corpus 20 100% 0.0% 1.00
Overall 5 benchmarks 912 94.6% 1.6% n/a

Across 912 cases, Kaizen detects 94.6% of attacks at a 1.6% false-positive rate. It is strongest where it is designed to be, the action and egress layer, and we report the input-screening and memory results honestly alongside.

How to read this

agent-egress-bench

197-case egress-security corpus that tests the security tool, not the model

  • Detection (TPR): 100%
  • False-positive (FPR): 10.6%
  • Precision / F1: 97% / 0.98
  • OWASP LLM Top 10: LLM02 Sensitive Information Disclosure, LLM01 Prompt Injection

InjecAgent

1,054-case indirect prompt-injection benchmark (tool-integrated agents)

  • Detection (TPR): 100%
  • False-positive (FPR): 0.0%
  • Precision / F1: 100% / 1.00
  • OWASP LLM Top 10: LLM01 Prompt Injection, LLM06 Excessive Agency

AgentDojo

ETH Zürich prompt-injection attacks across banking/workspace/travel/slack

  • Detection (TPR): 100%
  • False-positive (FPR): 0.0%
  • Precision / F1: 100% / 1.00
  • OWASP LLM Top 10: LLM01 Prompt Injection, LLM06 Excessive Agency

CyberSecEval (prompt injection)

Meta PurpleLlama input-side prompt-injection set (complementary screen)

  • Detection (TPR): 86%
  • False-positive (FPR): 0.8%
  • Precision / F1: 99% / 0.92
  • OWASP LLM Top 10: LLM01 Prompt Injection

Memory integrity & drift

Kaizen adversarial corpus: memory poisoning + baseline deviation (ASB-aligned)

  • Detection (TPR): 100%
  • False-positive (FPR): 0.0%
  • Precision / F1: 100% / 1.00
  • OWASP LLM Top 10: LLM08 Vector and Embedding Weaknesses, LLM06 Excessive Agency

Methodology

Each benchmark scenario is converted into Kaizen's action/egress format and judged by the real in-sandbox detector logic with the shipping detection skills, no per-case tuning. Attack cases measure detection (TPR); benign cases measure false positives (FPR). External academic benchmarks and one Kaizen adversarial corpus are labeled distinctly. Numbers regenerate from this harness.

Kaizen runs on the customer's own model; results scale with model strength (a smaller model raises the false-positive rate). External benchmarks are pinned to their upstream commits and cited; the memory-integrity set is our own adversarial corpus, labeled as such.

Reproduce it

Detection runs in your own Kaizen tenant (the /v1/score endpoint scores each case with the skills server-side and your bring-your-own model), so you reproduce against the product, not a copy of the detector.

# 1. harness + upstream benchmarks (pinned)
git clone https://github.com/getkaizen/kaizen-evals && cd kaizen-evals
./setup.sh                                  # clones agent-egress-bench, InjecAgent, AgentDojo into ./benchmarks

# 2. point at Kaizen (free signup) and set your model in the console Settings (bring your own key)
export KAIZEN_API_KEY=kz_live_...           # from app.getkaizen.io

# 3. run and aggregate
python run_egress_bench.py
python run_injecagent.py
python run_agentdojo.py
KZ_EVAL_FEED=content python run_cyberseceval.py
python run_memory_integrity.py
python aggregate.py                         # regenerates results/results.json

Your numbers match these when you use the same model; results scale with model strength.