Agent Metrics

Access RunMetrics, MessageMetrics, and SessionMetrics from agent runs.

The RunOutput from an agent run includes detailed metrics about token usage, cost, timing, and per-model breakdowns.

1from kern.agent import Agent
2from kern.models.openai import OpenAIResponses
3from kern.tools.hackernews import HackerNewsTools
4from kern.db.sqlite import SqliteDb
5from rich.pretty import pprint
6
7agent = Agent(
8 model=OpenAIResponses(id="gpt-5.2"),
9 tools=[HackerNewsTools()],
10 db=SqliteDb(db_file="tmp/agents.db"),
11 markdown=True,
12)
13
14run_response = agent.run("What are the top stories on HackerNews?")
15
16# Message metrics (MessageMetrics)
17for message in run_response.messages:
18 if message.role == "assistant":
19 pprint(message.metrics.to_dict())
20
21# Run metrics (RunMetrics)
22pprint(run_response.metrics.to_dict())
23
24# Per-model breakdown
25if run_response.metrics.details:
26 for model_type, model_metrics_list in run_response.metrics.details.items():
27 for m in model_metrics_list:
28 print(f"{model_type}: {m.provider}/{m.id} - {m.total_tokens} tokens")
29
30# Session metrics (SessionMetrics)
31pprint(agent.get_session_metrics().to_dict())

Metrics are available at multiple levels:

  • Per message: Each assistant message has MessageMetrics with per-API-call token counts and timing.
  • Per run: Each RunOutput has RunMetrics with aggregated totals and a details breakdown by model type.
  • Per session: agent.get_session_metrics() returns SessionMetrics aggregated across all runs.
LevelTypeAccess
Per messageMessageMetricsmessage.metrics
Per runRunMetricsrun_response.metrics
Per sessionSessionMetricsagent.get_session_metrics()

Run fields (RunMetrics)

FieldDescription
input_tokensTokens sent to the model.
output_tokensTokens generated by the model.
total_tokensSum of input_tokens and output_tokens.
audio_input_tokensAudio tokens in the input.
audio_output_tokensAudio tokens in the output.
audio_total_tokensTotal audio tokens.
cache_read_tokensTokens served from cache.
cache_write_tokensTokens written to cache.
reasoning_tokensTokens used for reasoning.
costCost of the run.
durationRun duration in seconds.
time_to_first_tokenTime from run start to first token (seconds).
detailsPer-model breakdown by model type. See Metrics reference.
additional_metricsExtra metrics (e.g., eval_duration).

Message fields (MessageMetrics)

FieldDescription
input_tokensTokens sent to the model.
output_tokensTokens generated by the model.
total_tokensSum of input_tokens and output_tokens.
audio_input_tokensAudio tokens in the input.
audio_output_tokensAudio tokens in the output.
audio_total_tokensTotal audio tokens.
cache_read_tokensTokens served from cache.
cache_write_tokensTokens written to cache.
reasoning_tokensTokens used for reasoning.
costCost of this API call.
durationDuration of this API call (seconds).
time_to_first_tokenTime to first token for this API call (seconds).
provider_metricsProvider-specific metrics (e.g., Ollama timing, Groq timing, Cerebras timing).

Developer Resources