Kaizen and prompt guardrails
Prompt guardrails and Kaizen watch different things. Guardrails read text; Kaizen reads actions. They sit at different layers and work well together.
A prompt guardrail (Llama Guard, NeMo Guardrails, a prompt firewall, an output classifier) inspects the prompt going in and the model's text coming out. It catches unsafe content, jailbreak attempts, and policy violations in the conversation.
What a guardrail does not see: the tool the agent actually called, the host it connected to, the file it read. An agent can pass every text check and still issue a damaging action, because the damage is in the doing, not the saying. A clean response can accompany a credential dump.
| Prompt guardrails | Kaizen | |
|---|---|---|
| What it inspects | the prompt and the model's text | the actions the agent takes |
| Catches unsafe content and jailbreaks | yes | no |
| Catches a bad tool call or connection | no | yes |
| Learns per-agent normal behaviour | no | yes |
| Verdict on what actually happened | no | yes |
Keep your prompt guardrails for input and output safety. Add Kaizen for the action layer: it evaluates what the agent does, learns each agent's normal behaviour, and catches the action that falls outside it. The two are complementary, not a choice.