Overview
LiteLLM provides a unified OpenAI-compatible interface to access 100+ LLM providers including OpenAI, Anthropic, Azure, Google, AWS Bedrock, and more. Run it as a proxy server for centralized model management. Model Class:OpenAIChatModel (OpenAI-compatible API)
Authentication
Setup LiteLLM Proxy
First, set up LiteLLM proxy server:Environment Variables
Using infer_model
Manual Configuration
Examples
Basic Usage
Multi-Model Setup
With Load Balancing
LiteLLM config with multiple deployments:With Fallbacks
Prompt Caching
LiteLLM passes through caching support from underlying providers:Model Parameters
Base Settings
| Parameter | Type | Description | Default |
|---|---|---|---|
max_tokens | int | Maximum tokens to generate | Model default |
temperature | float | Sampling temperature | 1.0 |
top_p | float | Nucleus sampling | 1.0 |
seed | int | Random seed | None |
stop_sequences | list[str] | Stop sequences | None |
presence_penalty | float | Token presence penalty | 0.0 |
frequency_penalty | float | Token frequency penalty | 0.0 |
Example Configuration
Advanced LiteLLM Configuration
With Budget Limits
With Rate Limiting
With Logging
With Caching (Redis)
Supported Providers
LiteLLM supports 100+ providers including:Major Providers
- OpenAI
- Anthropic
- Azure OpenAI
- Google (Vertex AI, Gemini)
- AWS Bedrock
- Cohere
- Mistral
- Groq
Cloud Providers
- AWS Bedrock
- Azure OpenAI
- Google Vertex AI
- IBM watsonx.ai
Open Source Platforms
- Ollama
- vLLM
- Hugging Face
- Together AI
Full list available at: LiteLLM Providers
Best Practices
- Centralized Configuration: Manage all models in one config
- Use Load Balancing: Distribute load across deployments
- Set Up Fallbacks: Ensure high availability
- Enable Caching: Reduce costs and latency
- Monitor Usage: Track per-model metrics
- Set Budget Limits: Prevent overspending
- Secure Proxy: Use master key in production
- Health Checks: Monitor proxy status
Features
Unified Interface
- OpenAI-compatible API
- Single integration for all providers
- Consistent request/response format
Load Balancing
- Round-robin across deployments
- Weighted routing
- Automatic failover
Cost Management
- Budget tracking per model
- Usage analytics
- Cost optimization
Reliability
- Automatic retries
- Fallback routing
- Health monitoring
Observability
- Request logging
- Performance metrics
- Error tracking
Monitoring
Advantages
- Unified Interface: One API for all providers
- Load Balancing: Built-in distribution
- Cost Control: Budget and rate limits
- Observability: Comprehensive logging
- Flexibility: Easy to add/remove models
- Reliability: Automatic fallbacks
- Caching: Built-in Redis caching
Limitations
- Extra Infrastructure: Requires proxy server
- Single Point of Failure: Unless deployed HA
- Latency: Additional network hop
- Complexity: More moving parts

