Binary Agent as Judge
Binary pass/fail evaluation without numeric scoring
This example demonstrates binary PASS/FAIL evaluation mode without numeric scoring.
Add the following code to your Python file
1from kern.agent import Agent2from kern.db.sqlite import SqliteDb3from kern.eval.agent_as_judge import AgentAsJudgeEval4from kern.models.openai import OpenAIResponses56# Setup database to persist eval results7db = SqliteDb(db_file="tmp/agent_as_judge_binary.db")89agent = Agent(10 model=OpenAIResponses(id="gpt-5.2"),11 instructions="You are a customer service agent. Respond professionally.",12 db=db,13)1415response = agent.run("I need help with my account")1617evaluation = AgentAsJudgeEval(18 name="Professional Tone Check",19 criteria="Response must maintain professional tone without informal language or slang",20 db=db,21)2223result = evaluation.run(24 input="I need help with my account",25 output=str(response.content),26 print_results=True,27 print_summary=True,28)2930print(f"Result: {'PASSED' if result.results[0].passed else 'FAILED'}")Set up your virtual environment
1uv venv --python 3.122source .venv/bin/activate1uv venv --python 3.122.venv\Scripts\activateInstall dependencies
1uv pip install -U kern-ai openaiExport your OpenAI API key
1export OPENAI_API_KEY="your_openai_api_key_here"1$Env:OPENAI_API_KEY="your_openai_api_key_here"Run the example
1python agent_as_judge_binary.py