Batch Agent as Judge

Evaluate multiple input/output pairs in a single batch

This example demonstrates batch evaluation where multiple cases are evaluated together.

Add the following code to your Python file

1from kern.db.sqlite import SqliteDb
2from kern.eval.agent_as_judge import AgentAsJudgeEval
3
4# Setup database to persist eval results
5db = SqliteDb(db_file="tmp/agent_as_judge_batch.db")
6
7evaluation = AgentAsJudgeEval(
8    name="Customer Service Quality",
9    criteria="Response should be empathetic, professional, and helpful",
10    scoring_strategy="binary",  # PASS/FAIL for each case
11    db=db,
12)
13
14result = evaluation.run(
15    cases=[
16        {
17            "input": "My order is delayed and I'm very upset!",
18            "output": "I sincerely apologize for the delay. I understand how frustrating this must be. Let me check your order status right away and see how we can make this right for you.",
19        },
20        {
21            "input": "Can you help me with a refund?",
22            "output": "Of course! I'd be happy to help with your refund. Could you please provide your order number so I can process this quickly for you?",
23        },
24        {
25            "input": "Your product is terrible!",
26            "output": "I'm sorry to hear you're disappointed. Your feedback is valuable to us. Could you share more details about what went wrong so we can improve?",
27        },
28    ],
29    print_results=True,
30)
31
32print(f"Pass rate: {result.pass_rate:.1f}%")
33print(f"Passed: {sum(1 for r in result.results if r.passed)}/{len(result.results)}")

Set up your virtual environment

1uv venv --python 3.12
2source .venv/bin/activate

1uv venv --python 3.12
2.venv\Scripts\activate

Install dependencies

1uv pip install -U kern-ai openai

Export your OpenAI API key

1export OPENAI_API_KEY="your_openai_api_key_here"

1$Env:OPENAI_API_KEY="your_openai_api_key_here"

Run the example

1python agent_as_judge_batch.py