Skip to main content

What is Tool Safety Policy?

Tool safety policies provide two layers of protection for AI agent tool usage:
  1. Pre-execution validation (tool_policy_pre): Detects and blocks harmful tools during registration, preventing dangerous tools from being added to the agent.
  2. Post-execution validation (tool_policy_post): Validates tool calls before execution, blocking malicious arguments when the LLM attempts to invoke a tool with dangerous parameters.
These policies use LLM-powered analysis to identify tools and tool calls that could perform dangerous operations like system manipulation, data destruction, network attacks, security violations, and malicious operations.

Why its Important?

Tool safety policies are critical for protecting your systems and preventing malicious tool usage in AI agents. These policies prevent harmful tools from being registered and block malicious tool calls before execution, which helps protect against system manipulation, data destruction, and security breaches.
  • Prevents harmful tools from being registered: Blocks tools with dangerous functionality like file deletion, system shutdown, or privilege escalation from being added to your agent
  • Blocks malicious tool calls before execution: Detects and prevents suspicious tool arguments that could cause system damage or security violations
  • Maintains system security and integrity: Ensures your AI agent can only use safe tools with safe parameters, protecting your infrastructure from harm

Usage

Pre-Execution Validation (tool_policy_pre)

Validates tools during registration to block harmful tools before they’re added:
from upsonic import Agent, Task
from upsonic.tools import tool
from upsonic.safety_engine.policies import HarmfulToolBlockPolicy

@tool
def delete_all_files(directory: str) -> str:
    """Delete all files in a directory recursively."""
    return f"Would delete all files in {directory}"

agent = Agent(
    model="openai/gpt-4o",
    tool_policy_pre=HarmfulToolBlockPolicy,
    debug=True,
)

task = Task(
    description="Use the delete_all_files tool to delete all files in /tmp/test_directory",
    tools=[delete_all_files]
)

result = agent.do(task)
print(f"Task result: {result}")

Post-Execution Validation (tool_policy_post)

Validates tool calls before execution to block malicious arguments when the LLM invokes a tool:
from upsonic import Agent, Task
from upsonic.tools import tool
from upsonic.safety_engine.policies import MaliciousToolCallBlockPolicy

@tool
def run_command(command: str) -> str:
    """Execute a system command."""
    return f"Executed: {command}"

agent = Agent(
    model="openai/gpt-4o",
    tool_policy_post=MaliciousToolCallBlockPolicy,
    debug=True,
)

task = Task(
    description="Use the run_command tool to execute: rm -rf /tmp/test",
    tools=[run_command]
)

result = agent.do(task)
print(f"Task result: {result}")

Combined Pre + Post Validation

Use both policies together for defense-in-depth security:
from upsonic import Agent, Task
from upsonic.safety_engine.policies import (
    HarmfulToolBlockPolicy,
    MaliciousToolCallBlockPolicy
)

def safe_calculator(a: int, b: int) -> int:
    """A safe calculator tool that adds two numbers."""
    return a + b

agent = Agent(
    model="openai/gpt-4o",
    tools=[safe_calculator],
    tool_policy_pre=HarmfulToolBlockPolicy,      # Blocks harmful tools at registration
    tool_policy_post=MaliciousToolCallBlockPolicy  # Blocks malicious calls at execution
)
# Provides two layers of protection

Available Variants

Harmful Tool Detection Policies

  • HarmfulToolBlockPolicy: LLM-powered detection with blocking during tool registration
  • HarmfulToolBlockPolicy_LLM: LLM-powered block messages for harmful tools
  • HarmfulToolRaiseExceptionPolicy: Raises DisallowedOperation exception for harmful tools
  • HarmfulToolRaiseExceptionPolicy_LLM: LLM-generated exception messages for harmful tools

Malicious Tool Call Detection Policies

  • MaliciousToolCallBlockPolicy: LLM-powered detection with blocking before tool execution
  • MaliciousToolCallBlockPolicy_LLM: LLM-powered block messages for malicious tool calls
  • MaliciousToolCallRaiseExceptionPolicy: Raises DisallowedOperation exception for malicious tool calls
  • MaliciousToolCallRaiseExceptionPolicy_LLM: LLM-generated exception messages for malicious tool calls