Batch Agent as Judge
Evaluate multiple input/output pairs in a single batch
This example demonstrates batch evaluation where multiple cases are evaluated together.
Add the following code to your Python file
1from kern.db.sqlite import SqliteDb2from kern.eval.agent_as_judge import AgentAsJudgeEval34# Setup database to persist eval results5db = SqliteDb(db_file="tmp/agent_as_judge_batch.db")67evaluation = AgentAsJudgeEval(8 name="Customer Service Quality",9 criteria="Response should be empathetic, professional, and helpful",10 scoring_strategy="binary", # PASS/FAIL for each case11 db=db,12)1314result = evaluation.run(15 cases=[16 {17 "input": "My order is delayed and I'm very upset!",18 "output": "I sincerely apologize for the delay. I understand how frustrating this must be. Let me check your order status right away and see how we can make this right for you.",19 },20 {21 "input": "Can you help me with a refund?",22 "output": "Of course! I'd be happy to help with your refund. Could you please provide your order number so I can process this quickly for you?",23 },24 {25 "input": "Your product is terrible!",26 "output": "I'm sorry to hear you're disappointed. Your feedback is valuable to us. Could you share more details about what went wrong so we can improve?",27 },28 ],29 print_results=True,30)3132print(f"Pass rate: {result.pass_rate:.1f}%")33print(f"Passed: {sum(1 for r in result.results if r.passed)}/{len(result.results)}")Set up your virtual environment
1uv venv --python 3.122source .venv/bin/activate1uv venv --python 3.122.venv\Scripts\activateInstall dependencies
1uv pip install -U kern-ai openaiExport your OpenAI API key
1export OPENAI_API_KEY="your_openai_api_key_here"1$Env:OPENAI_API_KEY="your_openai_api_key_here"Run the example
1python agent_as_judge_batch.py