Reliability Evaluation

The ReliabilityEvaluator is a post-execution assertion engine that verifies an agent’s tool-calling behavior. It checks whether the expected tools were invoked, in the correct order if required, and flags any unexpected tool calls.

How It Works

Run your agent, team, or graph to completion.
Pass the completed result (a Task, List[Task], or Graph) to the evaluator.
The evaluator extracts tool call history and compares it against the expected list.
Returns a structured result with pass/fail status, per-tool checks, and missing/unexpected lists.

Parameters

Parameter	Type	Required	Description
`expected_tool_calls`	`List[str]`	Yes	Tool names that should have been called
`order_matters`	`bool`	No	Whether call order must match (default: `False`)
`exact_match`	`bool`	No	Whether only expected tools may be called (default: `False`)

Input Types

The run() method accepts:

Input	Source
`Task`	Result of `Agent.do()` / `Agent.do_async()`
`List[Task]`	Result of `Team.multi_agent_async()`
`Graph`	A Graph instance after `graph.run()` / `graph.run_async()`

Result Structure

ReliabilityEvaluationResult contains:

passed — Overall pass/fail boolean
summary — Human-readable explanation
expected_tool_calls — The original expected list
actual_tool_calls — Ordered list of tools actually called
checks — List of ToolCallCheck objects (one per expected tool)
missing_tool_calls — Expected tools that were not invoked
unexpected_tool_calls — Tools called but not expected (only when exact_match=True)

Each ToolCallCheck includes:

tool_name — Name of the tool
was_called — Whether the tool was found in history
times_called — How many times it was invoked

Usage Examples

Agent

Verify agent tool calls

Team

Verify team tool calls

Graph

Verify graph tool calls

GET STARTED

CONCEPTS

STARTING AN AGENT PROJECT

READY TO USE SNIPPETS

DEPLOYMENT

FURTHER READINGS

Reliability Evaluation

How It Works

Parameters

Input Types

Result Structure

Usage Examples

Agent

Team

Graph

GET STARTED

CONCEPTS

STARTING AN AGENT PROJECT

READY TO USE SNIPPETS

DEPLOYMENT

FURTHER READINGS

​How It Works

​Parameters

​Input Types

​Result Structure

​Usage Examples

Agent

Team

Graph

How It Works

Parameters

Input Types

Result Structure

Usage Examples