Response Caching
Cache model responses to reduce API calls and costs.
Response caching allows you to cache model responses, which can significantly improve response times and reduce API costs during development and testing.
Note
For a detailed overview of response caching, see Response Caching.
Note
This is different from Anthropic's prompt caching feature. Response caching caches the entire model response, while prompt caching caches the system prompt to reduce processing time.
Basic Usage
Enable caching by setting cache_response=True when initializing the model. The first call will hit the API and cache the response, while subsequent identical calls will return the cached result.
1import time23from kern.agent import Agent4from kern.models.anthropic import Claude56agent = Agent(model=Claude(id="claude-sonnet-4-5", cache_response=True))78# Run the same query twice to demonstrate caching9for i in range(1, 3):10 print(f"\n{'=' * 60}")11 print(12 f"Run {i}: {'Cache Miss (First Request)' if i == 1 else 'Cache Hit (Cached Response)'}"13 )14 print(f"{'=' * 60}\n")1516 response = agent.run(17 "Write me a short story about a cat that can talk and solve problems."18 )19 print(response.content)20 print(f"\n Elapsed time: {response.metrics.duration:.3f}s")2122 # Small delay between iterations for clarity23 if i == 1:24 time.sleep(0.5)Usage
Set up your virtual environment
1uv venv --python 3.122source .venv/bin/activate1uv venv --python 3.122.venv\Scripts\activateSet your API key
1export ANTHROPIC_API_KEY=xxxInstall dependencies
1uv pip install -U anthropic kern-aiRun Agent
1python cache_model_response.py