Skip to main content
Upsonic provides a built-in evaluation framework to systematically test and benchmark your AI agents, teams, and graphs. Evaluations help you ensure that your AI workflows meet quality, performance, and reliability standards before deploying to production.

Evaluation Types

Quick Start

Install the required dependencies and run your first evaluation in minutes.
import asyncio
from upsonic import Agent
from upsonic.eval import AccuracyEvaluator

agent = Agent(
    model="anthropic/claude-sonnet-4-5",
    name="Assistant",
)

judge = Agent(
    model="anthropic/claude-sonnet-4-5",
    name="Judge",
)

evaluator = AccuracyEvaluator(
    judge_agent=judge,
    agent_under_test=agent,
    query="What is the capital of France?",
    expected_output="Paris is the capital of France.",
    additional_guidelines="Check if the answer correctly identifies Paris.",
    num_iterations=1,
)

result = asyncio.run(evaluator.run())

print(f"Score: {result.average_score}/10")
print(f"Passed: {result.evaluation_scores[0].is_met}")

Supported Entities

Every evaluator works with all three core entities:
EntityDescription
AgentSingle agent executing a task
TeamMulti-agent team in sequential, coordinate, or route mode
GraphDAG-based workflow with chained task nodes