Parameters
Parameter | Type | Default | Description |
---|---|---|---|
judge_agent | Agent | Required | The agent that will act as the judge for evaluation |
agent_under_test | Union[Agent, Graph, Team] | Required | The agent, graph, or team to be evaluated |
query | str | Required | The input query to test the agent with |
expected_output | str | Required | The expected or ground-truth output for comparison |
additional_guidelines | Optional[str] | None | Additional evaluation guidelines for the judge |
num_iterations | int | 1 | Number of evaluation iterations to run |
Functions
__init__
Initialize the AccuracyEvaluator.
Parameters:
judge_agent
(Agent): The agent that will act as the judge for evaluationagent_under_test
(Union[Agent, Graph, Team]): The agent, graph, or team to be evaluatedquery
(str): The input query to test the agent withexpected_output
(str): The expected or ground-truth output for comparisonadditional_guidelines
(Optional[str]): Additional evaluation guidelines for the judgenum_iterations
(int): Number of evaluation iterations to run
TypeError
: If judge_agent is not an Agent instanceTypeError
: If agent_under_test is not an Agent, Graph, or Team instanceValueError
: If num_iterations is not a positive integer
run
Run the accuracy evaluation.
Parameters:
print_results
(bool): Whether to print formatted results to console
AccuracyEvaluationResult
: The evaluation results
run_with_output
Run evaluation on a pre-existing output.
Parameters:
output
(str): The pre-existing output to evaluateprint_results
(bool): Whether to print formatted results to console
AccuracyEvaluationResult
: The evaluation results
_get_judge_score
Get the judge’s score for a generated output.
Parameters:
generated_output
(str): The output to be scored
EvaluationScore
: The judge’s evaluation score
TypeError
: If judge agent fails to return a valid EvaluationScore object
_aggregate_and_present_results
Aggregate results and present them in a formatted way.
Parameters:
final_generated_output
(str): The final generated outputprint_results
(bool): Whether to print results
AccuracyEvaluationResult
: The aggregated evaluation results
RuntimeError
: If evaluation finished without producing any results
_construct_judge_prompt
Construct the prompt for the judge agent.
Parameters:
generated_output
(str): The generated output to evaluate
str
: The constructed judge prompt
_print_formatted_results
Print formatted evaluation results to console.
Parameters:
result
(AccuracyEvaluationResult): The evaluation results to print