Advanced Usage
Patterns teams commonly reach for once the basics are wired up.Webhooks & background jobs
- Use
with_kaizen_client()(python/kaizen_client/decorators.py) to inject a managed client into webhook handlers or async job runners. - Store serialized KTOF payloads in your job queue; decode them with
prompts_decodeas soon as the worker begins processing to minimize memory churn.
Pagination
- When batching encode requests, include your own
metadata={"page": n}inside each payload. - On decode, Kaizen echoes the metadata so you can stitch responses back together without guessing which chunk they belong to.
File uploads
- Convert large JSON files to Python dictionaries, stream them into
compress, and persist the returned string alongside a checksum. - When uploading to object storage, store both the original file and its KTOF twin; the compressed variant is often 60–80% smaller.
Streaming & chunking
- Streaming endpoints are on the roadmap (
docs/TODO.md). Until native streaming ships, split prompts into manageable sections and callprompts_encodeper chunk. - Use
length_marker+delimiteroptions (seeEncodeOptions) to mark chunk boundaries before sending them to a provider that expects streaming tokens.
Batching
- Wrap multiple prompts in an array and pass them through
optimize_requestone at a time while reusing the sameKaizenClient. - Plan to add
prompts_encode_batch()once the API exposes it; for now, fan out withasyncio.gather()to run encodes concurrently.
Custom headers & observability
KaizenClientConfig(default_headers=...)lets you addX-Request-ID,X-Trace-ID, or tenant identifiers that flow to Kaizen logs.- Pair these headers with your tracing system so you can correlate cost savings with upstream requests.
Middleware & interceptors
- If you need to apply custom logic before/after each call, wrap the client methods in your own helper—e.g., a decorator that logs
stats, retries on 429, or pushes metrics to Prometheus. - Keep middleware lightweight; heavy processing should remain in your application layer to avoid slowing Kaizen requests.