Async Reliability Evaluation

Example showing how to run reliability evaluations asynchronously.

Create a Python file

1"""This example shows how to run a Reliability evaluation asynchronously."""
2
3import asyncio
4from typing import Optional
5
6from kern.agent import Agent
7from kern.eval.reliability import ReliabilityEval, ReliabilityResult
8from kern.models.openai import OpenAIResponses
9from kern.run.agent import RunOutput
10from kern.tools.calculator import CalculatorTools
11
12
13def factorial():
14 agent = Agent(
15 model=OpenAIResponses(id="gpt-5.2"),
16 tools=[CalculatorTools()],
17 )
18 response: RunOutput = agent.run("What is 10!?")
19 evaluation = ReliabilityEval(
20 agent_response=response,
21 expected_tool_calls=["factorial"],
22 )
23
24 # Run the evaluation calling the arun method.
25 result: Optional[ReliabilityResult] = asyncio.run(
26 evaluation.arun(print_results=True)
27 )
28 if result:
29 result.assert_passed()
30
31
32if __name__ == "__main__":
33 factorial()

Set up your virtual environment

1uv venv --python 3.12
2source .venv/bin/activate
1uv venv --python 3.12
2.venv\Scripts\activate

Install dependencies

1uv pip install -U openai kern-ai

Export your OpenAI API key

1export OPENAI_API_KEY="your_openai_api_key_here"
1$Env:OPENAI_API_KEY="your_openai_api_key_here"

Run Agent

1python reliability_async.py