Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.upsonic.ai/llms.txt

Use this file to discover all available pages before exploring further.

The AccuracyEvaluator uses an LLM judge to compare an agent’s generated output against an expected answer. Each evaluation produces a score from 1 to 10 along with detailed reasoning and constructive critique.

How It Works

  1. The agent under test receives a query and produces output.
  2. A separate judge agent evaluates the output against the expected answer and guidelines.
  3. The judge returns a structured EvaluationScore containing a numeric score, reasoning, pass/fail flag, and critique.
  4. If num_iterations > 1, the process repeats and scores are averaged.

Parameters

ParameterTypeRequiredDescription
judge_agentAgentYesAgent used to evaluate outputs
agent_under_testAgent | Graph | TeamYesEntity to evaluate
querystrYesInput query sent to the entity
expected_outputstrYesGround-truth answer for comparison
additional_guidelinesstrNoExtra criteria for the judge
num_iterationsintNoNumber of evaluation rounds (default: 1)

Result Structure

AccuracyEvaluationResult contains:
  • average_score — Mean score across all iterations (1–10)
  • evaluation_scores — List of EvaluationScore objects, one per iteration
  • generated_output — The output produced by the entity
  • user_query / expected_output — The original inputs
Each EvaluationScore includes:
  • score — Numeric score (1–10)
  • reasoning — Step-by-step explanation from the judge
  • is_met — Boolean indicating whether core requirements are met
  • critique — Actionable feedback on how to improve

Methods

MethodDescription
await run(print_results=True)Execute the entity, then evaluate output
await run_with_output(output, print_results=True)Evaluate a pre-existing output string

Usage Examples

Agent

Evaluate a single agent

Team

Evaluate a multi-agent team

Graph

Evaluate a graph workflow