Skip to content

Kaizen and prompt guardrails

Prompt guardrails and Kaizen watch different things. Guardrails read text; Kaizen reads actions. They sit at different layers and work well together.

A prompt guardrail (Llama Guard, NeMo Guardrails, a prompt firewall, an output classifier) inspects the prompt going in and the model's text coming out. It catches unsafe content, jailbreak attempts, and policy violations in the conversation.

What a guardrail does not see: the tool the agent actually called, the host it connected to, the file it read. An agent can pass every text check and still issue a damaging action, because the damage is in the doing, not the saying. A clean response can accompany a credential dump.

Prompt guardrails Kaizen
What it inspects the prompt and the model's text the actions the agent takes
Catches unsafe content and jailbreaks yes no
Catches a bad tool call or connection no yes
Learns per-agent normal behaviour no yes
Verdict on what actually happened no yes

Keep your prompt guardrails for input and output safety. Add Kaizen for the action layer: it evaluates what the agent does, learns each agent's normal behaviour, and catches the action that falls outside it. The two are complementary, not a choice.