Skip to main content

Overview

OpenRouter provides unified access to models from OpenAI, Anthropic, Google, Meta, and many others through a single API. Simplifies multi-model applications with consistent pricing and routing. Model Class: OpenAIChatModel (OpenAI-compatible API)

Authentication

Environment Variables

export OPENROUTER_API_KEY="sk-or-..."

Using infer_model

from upsonic import infer_model

# Access any supported model
model = infer_model("openrouter/anthropic/claude-3-5-sonnet")

Manual Configuration

from upsonic.models.openai import OpenAIChatModel, OpenAIChatModelSettings

settings = OpenAIChatModelSettings(
    max_tokens=2048,
    temperature=0.7
)

model = OpenAIChatModel(
    model_name="anthropic/claude-3-5-sonnet",
    provider="openrouter",
    settings=settings
)

Examples

Basic Usage

from upsonic import Agent, Task, infer_model

model = infer_model("openrouter/anthropic/claude-3-5-sonnet")
agent = Agent(model=model)

task = Task("Explain neural networks")
result = agent.do(task)

Access Different Providers

from upsonic import infer_model

# Anthropic
model_claude = infer_model("openrouter/anthropic/claude-3-5-sonnet")

# OpenAI
model_gpt = infer_model("openrouter/openai/gpt-4o")

# Google
model_gemini = infer_model("openrouter/google/gemini-2.5-flash")

# Meta
model_llama = infer_model("openrouter/meta-llama/llama-3.1-70b-instruct")

With Streaming

from upsonic import Agent, Task, infer_model

model = infer_model("openrouter/anthropic/claude-3-5-sonnet")
agent = Agent(model=model)

task = Task("Write a detailed article about AI")

async for chunk in agent.do_stream(task):
    print(chunk, end="", flush=True)

With Tools

from upsonic import Agent, Task, infer_model

def get_weather(location: str) -> str:
    """Get weather for a location."""
    return f"Weather in {location}: Sunny"

model = infer_model("openrouter/openai/gpt-4o")
agent = Agent(model=model, tools=[get_weather])

task = Task("What's the weather in Tokyo?")
result = agent.do(task)

Free Models

from upsonic import infer_model

# Some models are free on OpenRouter
model = infer_model("openrouter/google/gemini-2.5-flash-lite-free")
agent = Agent(model=model)

task = Task("Tell me about machine learning")
result = agent.do(task)

Prompt Caching

OpenRouter does not support native prompt caching. Each request is independent. Best Practice: Use memory for conversation context:
from upsonic import Agent, Task, infer_model
from upsonic.storage.memory import Memory
from upsonic.storage.providers.in_memory import InMemoryStorage

storage = InMemoryStorage()
memory = Memory(storage=storage, session_id="session-123")

model = infer_model("openrouter/anthropic/claude-3-5-sonnet")
agent = Agent(model=model, memory=memory)

Model Parameters

Base Settings

ParameterTypeDescriptionDefault
max_tokensintMaximum tokens to generateModel default
temperaturefloatSampling temperature1.0
top_pfloatNucleus sampling1.0
seedintRandom seed (if supported)None
stop_sequenceslist[str]Stop sequencesNone
presence_penaltyfloatToken presence penalty0.0
frequency_penaltyfloatToken frequency penalty0.0

Example Configuration

from upsonic.models.openai import OpenAIChatModel, OpenAIChatModelSettings

settings = OpenAIChatModelSettings(
    max_tokens=2048,
    temperature=0.7,
    top_p=0.9,
    presence_penalty=0.1,
    frequency_penalty=0.1
)

model = OpenAIChatModel(
    model_name="anthropic/claude-3-5-sonnet",
    provider="openrouter",
    settings=settings
)

Available Models

OpenRouter provides access to 100+ models from various providers:

Top Models

OpenAI

  • openai/gpt-4o
  • openai/gpt-4o-mini
  • openai/o1-preview

Anthropic

  • anthropic/claude-3-5-sonnet
  • anthropic/claude-3-5-haiku
  • anthropic/claude-opus-4

Google

  • google/gemini-2.5-pro
  • google/gemini-2.5-flash
  • google/gemini-2.5-flash-lite-free (Free!)

Meta

  • meta-llama/llama-3.1-405b-instruct
  • meta-llama/llama-3.1-70b-instruct
  • meta-llama/llama-3.1-8b-instruct
  • mistralai/mistral-large
  • cohere/command-r-plus
  • deepseek/deepseek-chat

Free Models

OpenRouter offers some free models:
  • google/gemini-2.5-flash-lite-free
  • meta-llama/llama-3.1-8b-instruct:free
  • Various community-hosted models

Model Selection Guide

Use CaseRecommended ModelWhy
Complex tasksanthropic/claude-opus-4Best reasoning
Balanced performanceopenai/gpt-4oReliable all-rounder
Cost-effectivegoogle/gemini-2.5-flashGood price/performance
Free tiergoogle/gemini-2.5-flash-lite-freeNo cost
Code generationopenai/gpt-4o or anthropic/claude-3-5-sonnetStrong code understanding

Pricing

OpenRouter uses a unified pricing model:
  • Pay-as-you-go: Only pay for what you use
  • No subscriptions: No monthly fees
  • Credits system: Add credits to your account
  • Transparent: See per-token costs for each model
  • Competitive: Often cheaper than going direct

Cost Optimization

# Use cost-effective models for simple tasks
simple_model = infer_model("openrouter/google/gemini-2.5-flash")

# Reserve expensive models for complex tasks
complex_model = infer_model("openrouter/anthropic/claude-opus-4")

# Use free models for testing
test_model = infer_model("openrouter/google/gemini-2.5-flash-lite-free")

Best Practices

  1. Model Selection: Choose the right model for each task
  2. Monitor Costs: Track usage in OpenRouter dashboard
  3. Use Free Models: For development and testing
  4. Implement Fallbacks: Handle rate limits and errors
  5. Set Budgets: Configure spending limits
  6. Test Before Production: Verify model quality
  7. Rate Limiting: Implement backoff for retries

Features

Unified API

  • Single API for all models
  • Consistent request format
  • Simplified integration

Automatic Routing

  • Automatic fallback if model is down
  • Load balancing across providers
  • Best availability

Model Comparison

  • Test multiple models easily
  • Compare performance
  • Optimize costs

Rate Limit Handling

  • Automatic retry with backoff
  • Queue management
  • Better reliability

Error Handling

from upsonic import Agent, Task, infer_model
from upsonic.utils.package.exception import ModelHTTPError
import asyncio

async def request_with_retry(agent, task, max_retries=3):
    for attempt in range(max_retries):
        try:
            return agent.do(task)
        except ModelHTTPError as e:
            if e.status_code == 429:  # Rate limit
                wait_time = 2 ** attempt
                print(f"Rate limited, waiting {wait_time}s...")
                await asyncio.sleep(wait_time)
            elif e.status_code >= 500:  # Server error
                wait_time = 2 ** attempt
                print(f"Server error, retrying in {wait_time}s...")
                await asyncio.sleep(wait_time)
            else:
                raise
    raise Exception("Max retries exceeded")

Advantages

  1. Unified Access: One API for many providers
  2. Cost Effective: Competitive pricing
  3. Reliability: Automatic fallback
  4. Simplicity: No need for multiple API keys
  5. Flexibility: Switch models easily
  6. Free Options: Available for testing

Limitations

  1. No Native Caching: Each request is independent
  2. Additional Latency: Routing overhead
  3. Rate Limits: Shared across all users
  4. Feature Gaps: May not support all provider-specific features