Reliability with Database Logging

Example showing how to store reliability evaluation results in the database.

Create a Python file

1"""Example showing how to store evaluation results in the database."""
2
3from typing import Optional
4
5from kern.agent import Agent
6from kern.db.postgres.postgres import PostgresDb
7from kern.eval.reliability import ReliabilityEval, ReliabilityResult
8from kern.models.openai import OpenAIResponses
9from kern.run.agent import RunOutput
10from kern.tools.calculator import CalculatorTools
11
12# Setup the database
13db_url = "postgresql+psycopg://ai:ai@localhost:5432/ai"
14db = PostgresDb(db_url=db_url, eval_table="eval_runs")
15
16
17agent = Agent(
18    model=OpenAIResponses(id="gpt-5.2"),
19    tools=[CalculatorTools()],
20)
21response: RunOutput = agent.run("What is 10!?")
22
23evaluation = ReliabilityEval(
24    db=db,  # Pass the database to the evaluation. Results will be stored in the database.
25    name="Tool Call Reliability",
26    agent_response=response,
27    expected_tool_calls=["factorial"],
28)
29result: Optional[ReliabilityResult] = evaluation.run(print_results=True)

Set up your virtual environment

1uv venv --python 3.12
2source .venv/bin/activate

1uv venv --python 3.12
2.venv\Scripts\activate

Install dependencies

1uv pip install -U openai kern-ai psycopg

Export your OpenAI API key

1export OPENAI_API_KEY="your_openai_api_key_here"

1$Env:OPENAI_API_KEY="your_openai_api_key_here"

Run Agent

1python reliability_db_logging.py