Overview
OpenRouter provides unified access to models from OpenAI, Anthropic, Google, Meta, and many others through a single API. Simplifies multi-model applications with consistent pricing and routing. Model Class:OpenAIChatModel (OpenAI-compatible API)
Authentication
Environment Variables
Using infer_model
Manual Configuration
Examples
Basic Usage
Access Different Providers
With Streaming
With Tools
Free Models
Prompt Caching
OpenRouter does not support native prompt caching. Each request is independent. Best Practice: Use memory for conversation context:Model Parameters
Base Settings
| Parameter | Type | Description | Default |
|---|---|---|---|
max_tokens | int | Maximum tokens to generate | Model default |
temperature | float | Sampling temperature | 1.0 |
top_p | float | Nucleus sampling | 1.0 |
seed | int | Random seed (if supported) | None |
stop_sequences | list[str] | Stop sequences | None |
presence_penalty | float | Token presence penalty | 0.0 |
frequency_penalty | float | Token frequency penalty | 0.0 |
Example Configuration
Available Models
OpenRouter provides access to 100+ models from various providers:Top Models
OpenAI
openai/gpt-4oopenai/gpt-4o-miniopenai/o1-preview
Anthropic
anthropic/claude-3-5-sonnetanthropic/claude-3-5-haikuanthropic/claude-opus-4
google/gemini-2.5-progoogle/gemini-2.5-flashgoogle/gemini-2.5-flash-lite-free(Free!)
Meta
meta-llama/llama-3.1-405b-instructmeta-llama/llama-3.1-70b-instructmeta-llama/llama-3.1-8b-instruct
Other Popular
mistralai/mistral-largecohere/command-r-plusdeepseek/deepseek-chat
Free Models
OpenRouter offers some free models:google/gemini-2.5-flash-lite-freemeta-llama/llama-3.1-8b-instruct:free- Various community-hosted models
Model Selection Guide
| Use Case | Recommended Model | Why |
|---|---|---|
| Complex tasks | anthropic/claude-opus-4 | Best reasoning |
| Balanced performance | openai/gpt-4o | Reliable all-rounder |
| Cost-effective | google/gemini-2.5-flash | Good price/performance |
| Free tier | google/gemini-2.5-flash-lite-free | No cost |
| Code generation | openai/gpt-4o or anthropic/claude-3-5-sonnet | Strong code understanding |
Pricing
OpenRouter uses a unified pricing model:- Pay-as-you-go: Only pay for what you use
- No subscriptions: No monthly fees
- Credits system: Add credits to your account
- Transparent: See per-token costs for each model
- Competitive: Often cheaper than going direct
Cost Optimization
Best Practices
- Model Selection: Choose the right model for each task
- Monitor Costs: Track usage in OpenRouter dashboard
- Use Free Models: For development and testing
- Implement Fallbacks: Handle rate limits and errors
- Set Budgets: Configure spending limits
- Test Before Production: Verify model quality
- Rate Limiting: Implement backoff for retries
Features
Unified API
- Single API for all models
- Consistent request format
- Simplified integration
Automatic Routing
- Automatic fallback if model is down
- Load balancing across providers
- Best availability
Model Comparison
- Test multiple models easily
- Compare performance
- Optimize costs
Rate Limit Handling
- Automatic retry with backoff
- Queue management
- Better reliability
Error Handling
Advantages
- Unified Access: One API for many providers
- Cost Effective: Competitive pricing
- Reliability: Automatic fallback
- Simplicity: No need for multiple API keys
- Flexibility: Switch models easily
- Free Options: Available for testing
Limitations
- No Native Caching: Each request is independent
- Additional Latency: Routing overhead
- Rate Limits: Shared across all users
- Feature Gaps: May not support all provider-specific features

