Response Caching

Cache model responses to reduce API calls and costs.

Note

For a conceptual overview of response caching, see Response Caching.

Response caching allows you to cache model responses, which can significantly improve response times and reduce API costs during development and testing.

Basic Usage

Enable caching by setting cache_response=True when initializing the model. The first call will hit the API and cache the response, while subsequent identical calls will return the cached result.

1import time
2
3from kern.agent import Agent
4from kern.models.openai import OpenAIResponses
5
6agent = Agent(model=OpenAIResponses(id="gpt-5.2", cache_response=True))
7
8# Run the same query twice to demonstrate caching
9for i in range(1, 3):
10 print(f"\n{'=' * 60}")
11 print(
12 f"Run {i}: {'Cache Miss (First Request)' if i == 1 else 'Cache Hit (Cached Response)'}"
13 )
14 print(f"{'=' * 60}\n")
15
16 response = agent.run(
17 "Write me a short story about a cat that can talk and solve problems."
18 )
19 print(response.content)
20 print(f"\n Elapsed time: {response.metrics.duration:.3f}s")
21
22 # Small delay between iterations for clarity
23 if i == 1:
24 time.sleep(0.5)

Usage

Set up your virtual environment

1uv venv --python 3.12
2source .venv/bin/activate
1uv venv --python 3.12
2.venv\Scripts\activate

Set your API key

1export OPENAI_API_KEY=xxx

Install dependencies

1uv pip install -U openai kern-ai

Run Agent

1python cache_model_response.py