> ## Documentation Index > Fetch the complete documentation index at: https://docs.upsonic.ai/llms.txt > Use this file to discover all available pages before exploring further. # Evals > Measure accuracy, performance, and reliability of your AI agents, teams, and graphs Upsonic provides a built-in evaluation framework to systematically test and benchmark your AI agents, teams, and graphs. Evaluations help you ensure that your AI workflows meet quality, performance, and reliability standards before deploying to production. ## Evaluation Types LLM-as-a-judge evaluation that scores agent output quality against expected answers on a 1–10 scale. Latency and memory profiling with statistical analysis across multiple iterations. Tool-call verification that asserts expected tools were invoked during execution. ## Quick Start Install the required dependencies and run your first evaluation in minutes. ```python theme={null} import asyncio from upsonic import Agent from upsonic.eval import AccuracyEvaluator agent = Agent( model="anthropic/claude-sonnet-4-5", name="Assistant", ) judge = Agent( model="anthropic/claude-sonnet-4-5", name="Judge", ) evaluator = AccuracyEvaluator( judge_agent=judge, agent_under_test=agent, query="What is the capital of France?", expected_output="Paris is the capital of France.", additional_guidelines="Check if the answer correctly identifies Paris.", num_iterations=1, ) result = asyncio.run(evaluator.run()) print(f"Score: {result.average_score}/10") print(f"Passed: {result.evaluation_scores[0].is_met}") ``` ## Supported Entities Every evaluator works with all three core entities: | Entity | Description | | --------- | --------------------------------------------------------- | | **Agent** | Single agent executing a task | | **Team** | Multi-agent team in sequential, coordinate, or route mode | | **Graph** | DAG-based workflow with chained task nodes |