Kaizen SDK overview
Compress prompts, shrink latency, and decode large-model responses with the Kaizen Token Optimized Format (KTOF)—a lightweight layer that sits between your app and any LLM provider.One sentence promise · Kaizen detects structured data inside prompts, compresses it, and restores it losslessly so you spend fewer tokens without changing the way you build.
Key capabilities
- Prompt compression –
compress,optimize, andprompts_encoderoutes turn large JSON/chat payloads into compact KTOF strings with byte + token stats. - Response hydration –
decompress,prompts_decode, andoptimize_responserebuild the original structure—including metadata—for safer downstream handling. - Provider adapters – Thin wrappers for OpenAI, Anthropic, and Gemini keep your existing SDK code while adding transparent encode/decode hooks.
- Observability hooks – Every response returns
stats, optionaltoken_stats, and echoes your metadata so you can track savings per request or workflow. - Enterprise-ready deployment – Default SaaS endpoint (
https://api.getkaizen.io/) plus dedicated, self-hosted, or air‑gapped options on request.
Supported platforms
- Python (
kaizen-client≥ 0.1.0) – fully typed async client used throughout this guide. - REST/OpenAPI –
openapi.jsonships with the repo for custom client generation. - Coming soon – JavaScript/TypeScript, Go, Java, and CLI tooling follow the same schema; join the preview at [email protected].
Versioning & support
- Current SDK release: v0.1.0 (
python/pyproject.toml), targeting Python 3.10+. - Kaizen API is versioned at
/v1/...—backwards-compatible changes are additive; breaking changes trigger a new SDK minor version with migration notes. - Report issues via GitHub or email
[email protected]. Enterprise customers receive dedicated support channels.
Typical workflow
- Install & configure –
pip install kaizen-client[all], exportKAIZEN_API_KEY, and optionally configure extras for OpenAI/Anthropic/Gemini. - Encode before provider calls – run
client.prompts_encode()orclient.optimize_request()to generate a compressed payload plus stats. - Send to any LLM – pass the returned
resultstring to your provider SDK (or wrapper) like a normal prompt. - Decode responses – hand
completion.output_text(or equivalent) toclient.prompts_decode()/client.optimize_response()to recover structured JSON. - Track savings – log the byte/token deltas from the
statsblock, and forward them to observability tools for cost tracking.