Async Accuracy Evaluation

Example showing how to run accuracy evaluations asynchronously for better performance.

Create a Python file

1"""This example shows how to run an Accuracy evaluation asynchronously."""
2
3import asyncio
4from typing import Optional
5
6from kern.agent import Agent
7from kern.eval.accuracy import AccuracyEval, AccuracyResult
8from kern.models.openai import OpenAIResponses
9from kern.tools.calculator import CalculatorTools
10
11evaluation = AccuracyEval(
12 model=OpenAIResponses(id="gpt-5.2"),
13 agent=Agent(
14 model=OpenAIResponses(id="gpt-5.2"),
15 tools=[CalculatorTools()],
16 ),
17 input="What is 10*5 then to the power of 2? do it step by step",
18 expected_output="2500",
19 additional_guidelines="Agent output should include the steps and the final answer.",
20 num_iterations=3,
21)
22
23# Run the evaluation calling the arun method.
24result: Optional[AccuracyResult] = asyncio.run(evaluation.arun(print_results=True))
25assert result is not None and result.avg_score >= 8

Set up your virtual environment

1uv venv --python 3.12
2source .venv/bin/activate
1uv venv --python 3.12
2.venv\Scripts\activate

Install dependencies

1uv pip install -U openai kern-ai

Export your OpenAI API key

1export OPENAI_API_KEY="your_openai_api_key_here"
1$Env:OPENAI_API_KEY="your_openai_api_key_here"

Run Agent

1python accuracy_async.py