Evaluation Types
Accuracy
LLM-as-a-judge evaluation that scores agent output quality against expected answers on a 1–10 scale.
Performance
Latency and memory profiling with statistical analysis across multiple iterations.
Reliability
Tool-call verification that asserts expected tools were invoked during execution.
Quick Start
Install the required dependencies and run your first evaluation in minutes.Supported Entities
Every evaluator works with all three core entities:| Entity | Description |
|---|---|
| Agent | Single agent executing a task |
| Team | Multi-agent team in sequential, coordinate, or route mode |
| Graph | DAG-based workflow with chained task nodes |

