Async Reliability Evaluation

Example showing how to run reliability evaluations asynchronously.

Create a Python file

1"""This example shows how to run a Reliability evaluation asynchronously."""
2
3import asyncio
4from typing import Optional
5
6from kern.agent import Agent
7from kern.eval.reliability import ReliabilityEval, ReliabilityResult
8from kern.models.openai import OpenAIResponses
9from kern.run.agent import RunOutput
10from kern.tools.calculator import CalculatorTools
11
12
13def factorial():
14    agent = Agent(
15        model=OpenAIResponses(id="gpt-5.2"),
16        tools=[CalculatorTools()],
17    )
18    response: RunOutput = agent.run("What is 10!?")
19    evaluation = ReliabilityEval(
20        agent_response=response,
21        expected_tool_calls=["factorial"],
22    )
23
24    # Run the evaluation calling the arun method.
25    result: Optional[ReliabilityResult] = asyncio.run(
26        evaluation.arun(print_results=True)
27    )
28    if result:
29        result.assert_passed()
30
31
32if __name__ == "__main__":
33    factorial()

Set up your virtual environment

1uv venv --python 3.12
2source .venv/bin/activate

1uv venv --python 3.12
2.venv\Scripts\activate

Install dependencies

1uv pip install -U openai kern-ai

Export your OpenAI API key

1export OPENAI_API_KEY="your_openai_api_key_here"

1$Env:OPENAI_API_KEY="your_openai_api_key_here"

Run Agent

1python reliability_async.py