Comparison Accuracy Evaluation
Example showing how to evaluate agent accuracy on comparison tasks.
Create a Python file
1from typing import Optional23from kern.agent import Agent4from kern.eval.accuracy import AccuracyEval, AccuracyResult5from kern.models.openai import OpenAIResponses6from kern.tools.calculator import CalculatorTools78evaluation = AccuracyEval(9 name="Comparison Evaluation",10 model=OpenAIResponses(id="gpt-5.2"),11 agent=Agent(12 model=OpenAIResponses(id="gpt-5.2"),13 tools=[CalculatorTools()],14 instructions="You must use the calculator tools for comparisons.",15 ),16 input="9.11 and 9.9 -- which is bigger?",17 expected_output="9.9",18 additional_guidelines="Its ok for the output to include additional text or information relevant to the comparison.",19)2021result: Optional[AccuracyResult] = evaluation.run(print_results=True)22assert result is not None and result.avg_score >= 8Set up your virtual environment
1uv venv --python 3.122source .venv/bin/activate1uv venv --python 3.122.venv\Scripts\activateInstall dependencies
1uv pip install -U openai kern-aiExport your OpenAI API key
1export OPENAI_API_KEY="your_openai_api_key_here"1$Env:OPENAI_API_KEY="your_openai_api_key_here"Run Agent
1python accuracy_comparison.py