LiteLLM - Upsonic AI

Overview

LiteLLM provides a unified OpenAI-compatible interface to access 100+ LLM providers including OpenAI, Anthropic, Azure, Google, AWS Bedrock, and more. Run it as a proxy server for centralized model management. Model Class: OpenAIChatModel (OpenAI-compatible API)

Authentication

Setup LiteLLM Proxy

First, set up LiteLLM proxy server:

# Install LiteLLM
pip install litellm[proxy]

# Create config file
cat > config.yaml << EOF
model_list:
  - model_name: gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: os.environ/OPENAI_API_KEY
  
  - model_name: claude-sonnet
    litellm_params:
      model: anthropic/claude-3-5-sonnet-20241022
      api_key: os.environ/ANTHROPIC_API_KEY
  
  - model_name: gemini-flash
    litellm_params:
      model: gemini/gemini-2.5-flash
      api_key: os.environ/GOOGLE_API_KEY
EOF

# Start proxy
litellm --config config.yaml

Environment Variables

export LITELLM_BASE_URL="http://localhost:4000"  # LiteLLM proxy URL
# No API key needed if proxy handles auth

Using infer_model

from upsonic import infer_model

# Use model name from config
model = infer_model("litellm/gpt-4o")

Manual Configuration

from upsonic.models.openai import OpenAIChatModel, OpenAIChatModelSettings

settings = OpenAIChatModelSettings(
    max_tokens=2048,
    temperature=0.7
)

model = OpenAIChatModel(
    model_name="gpt-4o",  # From your config
    provider="litellm",
    settings=settings
)

Examples

Basic Usage

from upsonic import Agent, Task, infer_model

model = infer_model("litellm/gpt-4o")
agent = Agent(model=model)

task = Task("Explain machine learning")
result = agent.do(task)

Multi-Model Setup

from upsonic import infer_model

# Different models through same proxy
gpt_model = infer_model("litellm/gpt-4o")
claude_model = infer_model("litellm/claude-sonnet")
gemini_model = infer_model("litellm/gemini-flash")

# Use based on task requirements
def get_model_for_task(task_type: str):
    if task_type == "code":
        return claude_model
    elif task_type == "analysis":
        return gpt_model
    else:
        return gemini_model

With Load Balancing

LiteLLM config with multiple deployments:

model_list:
  # Load balance across multiple OpenAI deployments
  - model_name: gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: os.environ/OPENAI_KEY_1
  
  - model_name: gpt-4o
    litellm_params:
      model: azure/gpt-4o
      api_base: os.environ/AZURE_ENDPOINT
      api_key: os.environ/AZURE_KEY

With Fallbacks

model_list:
  # Primary model
  - model_name: main-model
    litellm_params:
      model: openai/gpt-4o
      api_key: os.environ/OPENAI_API_KEY
    model_info:
      mode: fallback
      priority: 1
  
  # Fallback model
  - model_name: main-model
    litellm_params:
      model: anthropic/claude-3-5-sonnet
      api_key: os.environ/ANTHROPIC_API_KEY
    model_info:
      mode: fallback
      priority: 2

Prompt Caching

LiteLLM passes through caching support from underlying providers:

from upsonic import Agent, Task, infer_model

# If using Claude through LiteLLM, caching works
model = infer_model("litellm/claude-sonnet")
agent = Agent(
    model=model,
    system_prompt="Long context that will be cached..."
)

# Subsequent requests benefit from Claude's caching
task1 = Task("Question 1")
result1 = agent.do(task1)

task2 = Task("Question 2") 
result2 = agent.do(task2)  # Cached

Model Parameters

Base Settings

Parameter	Type	Description	Default
`max_tokens`	`int`	Maximum tokens to generate	Model default
`temperature`	`float`	Sampling temperature	1.0
`top_p`	`float`	Nucleus sampling	1.0
`seed`	`int`	Random seed	None
`stop_sequences`	`list[str]`	Stop sequences	None
`presence_penalty`	`float`	Token presence penalty	0.0
`frequency_penalty`	`float`	Token frequency penalty	0.0

Example Configuration

from upsonic.models.openai import OpenAIChatModel, OpenAIChatModelSettings

settings = OpenAIChatModelSettings(
    max_tokens=2048,
    temperature=0.7,
    top_p=0.9,
    presence_penalty=0.1,
    frequency_penalty=0.1
)

model = OpenAIChatModel(
    model_name="gpt-4o",
    provider="litellm",
    settings=settings
)

Advanced LiteLLM Configuration

With Budget Limits

general_settings:
  master_key: sk-1234  # Secure your proxy
  
model_list:
  - model_name: gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: os.environ/OPENAI_API_KEY
    model_info:
      max_budget: 100  # $100 budget
      budget_duration: 30d  # Monthly

With Rate Limiting

model_list:
  - model_name: gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: os.environ/OPENAI_API_KEY
    model_info:
      rpm: 60  # 60 requests per minute
      tpm: 100000  # 100k tokens per minute

With Logging

general_settings:
  litellm_settings:
    success_callback: ["langfuse"]  # Log successful calls
    failure_callback: ["slack"]  # Alert on failures

model_list:
  - model_name: gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: os.environ/OPENAI_API_KEY

With Caching (Redis)

general_settings:
  cache: true
  cache_params:
    type: redis
    host: localhost
    port: 6379

model_list:
  - model_name: gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: os.environ/OPENAI_API_KEY
    cache:
      ttl: 3600  # Cache for 1 hour

Supported Providers

LiteLLM supports 100+ providers including:

Major Providers

OpenAI
Anthropic
Azure OpenAI
Google (Vertex AI, Gemini)
AWS Bedrock
Cohere
Mistral
Groq

Cloud Providers

AWS Bedrock
Azure OpenAI
Google Vertex AI
IBM watsonx.ai

Open Source Platforms

Ollama
vLLM
Hugging Face
Together AI

Full list available at: LiteLLM Providers

Best Practices

Centralized Configuration: Manage all models in one config
Use Load Balancing: Distribute load across deployments
Set Up Fallbacks: Ensure high availability
Enable Caching: Reduce costs and latency
Monitor Usage: Track per-model metrics
Set Budget Limits: Prevent overspending
Secure Proxy: Use master key in production
Health Checks: Monitor proxy status

Features

Unified Interface

OpenAI-compatible API
Single integration for all providers
Consistent request/response format

Load Balancing

Round-robin across deployments
Weighted routing
Automatic failover

Cost Management

Budget tracking per model
Usage analytics
Cost optimization

Reliability

Automatic retries
Fallback routing
Health monitoring

Observability

Request logging
Performance metrics
Error tracking

Monitoring

# LiteLLM proxy exposes metrics
import requests

# Check proxy health
health = requests.get("http://localhost:4000/health")
print(health.json())

# Get model info
models = requests.get("http://localhost:4000/models")
print(models.json())

# View usage
usage = requests.get("http://localhost:4000/spend/keys")
print(usage.json())

Advantages

Unified Interface: One API for all providers
Load Balancing: Built-in distribution
Cost Control: Budget and rate limits
Observability: Comprehensive logging
Flexibility: Easy to add/remove models
Reliability: Automatic fallbacks
Caching: Built-in Redis caching

Limitations

Extra Infrastructure: Requires proxy server
Single Point of Failure: Unless deployed HA
Latency: Additional network hop
Complexity: More moving parts

Deployment Options

Docker

FROM python:3.11
RUN pip install litellm[proxy]
COPY config.yaml /app/config.yaml
CMD ["litellm", "--config", "/app/config.yaml"]

Kubernetes

apiVersion: apps/v1
kind: Deployment
metadata:
  name: litellm-proxy
spec:
  replicas: 3  # High availability
  selector:
    matchLabels:
      app: litellm
  template:
    metadata:
      labels:
        app: litellm
    spec:
      containers:
      - name: litellm
        image: your-litellm-image
        ports:
        - containerPort: 4000
        envFrom:
        - secretRef:
            name: litellm-secrets

GET STARTED

UPSONIC 101 GUIDE

CONCEPTS

DEPLOYMENT

FURTHER READINGS

​Overview

​Authentication

​Setup LiteLLM Proxy

​Environment Variables

​Using infer_model

​Manual Configuration

​Examples

​Basic Usage

​Multi-Model Setup

​With Load Balancing

​With Fallbacks

​Prompt Caching

​Model Parameters

​Base Settings

​Example Configuration

​Advanced LiteLLM Configuration

​With Budget Limits

​With Rate Limiting

​With Logging

​With Caching (Redis)

​Supported Providers

​Major Providers

​Cloud Providers

​Open Source Platforms

​Full list available at: LiteLLM Providers

​Best Practices

​Features

​Unified Interface

​Load Balancing

​Cost Management

​Reliability

​Observability

​Monitoring

​Advantages

​Limitations

​Deployment Options

​Docker

​Kubernetes

​Related Resources