OpenAI-Compatible Models

What are OpenAI-Compatible Models?

OpenAI-compatible models are LLM providers and services that implement the OpenAI API specification. This means they expose the same REST API endpoints, request/response formats, and SDK compatibility as OpenAI, allowing you to use them as drop-in replacements for OpenAI models with minimal code changes.

Key Characteristics

API Compatibility:

Same endpoint structure (/v1/chat/completions, /v1/completions)
Identical request/response JSON format
Compatible with OpenAI Python/JavaScript SDKs
Standard authentication via API keys

Model Classes:

All use OpenAIChatModel in Upsonic
Import from upsonic.models.openai
Configure via provider parameter or base_url

Benefits:

Easy Migration: Switch providers without rewriting code
Vendor Independence: Not locked into single provider
Cost Optimization: Choose best price/performance ratio
Redundancy: Fallback between providers
Feature Access: Use latest open-source models

Supported Providers in Upsonic

Provider	Base URL	Best For
DeepSeek	`https://api.deepseek.com`	Cost-effective reasoning
Cerebras	`https://api.cerebras.ai/v1`	Ultra-fast inference
Fireworks	`https://api.fireworks.ai/inference/v1`	Open model access
GitHub Models	`https://models.inference.ai.azure.com`	Developer testing
Together AI	`https://api.together.xyz`	Collaborative serving
Azure OpenAI	`https://{resource}.openai.azure.com`	Enterprise deployment
Ollama	`http://localhost:11434/v1`	Local inference
Grok	`https://api.x.ai/v1`	Real-time information
Vercel AI	Various	Edge-optimized
Heroku	Various	Cloud platform

Note: Some providers like Groq have dedicated model classes but also support OpenAI-compatible API.

Usage

Basic Usage with infer_model

The simplest way to use OpenAI-compatible models is with infer_model:

from upsonic import Agent, Task, infer_model

# Using provider prefix (recommended)
model = infer_model("deepseek/deepseek-chat")
agent = Agent(model=model)

task = Task("Explain quantum computing")
result = agent.do(task)

Switching Between Providers (All are openai compatible)

from upsonic import infer_model

# OpenAI
model_openai = infer_model("openai/gpt-4o")

# DeepSeek - same interface
model_deepseek = infer_model("deepseek/deepseek-chat")

# Cerebras - same interface
model_cerebras = infer_model("cerebras/llama-3.3-70b")

# All work identically
agent = Agent(model=model_deepseek)
result = agent.do(Task("Hello"))

Manual Configuration

For more control, instantiate models directly:

from upsonic.models.openai import OpenAIChatModel, OpenAIChatModelSettings

settings = OpenAIChatModelSettings(
    max_tokens=2048,
    temperature=0.7
)

# Using provider name
model = OpenAIChatModel(
    model_name="deepseek-chat",
    provider="deepseek",
    settings=settings
)

Using Custom Base URL

For self-hosted or custom endpoints:

from upsonic.models.openai import OpenAIChatModel
from openai import AsyncOpenAI

# Custom OpenAI-compatible endpoint
client = AsyncOpenAI(
    base_url="https://your-custom-endpoint.com/v1",
    api_key="your-api-key"
)

from upsonic.providers.openai import OpenAIProvider
provider = OpenAIProvider(client=client)

model = OpenAIChatModel(
    model_name="your-model-name",
    provider=provider
)

Environment-Based Configuration

import os
from upsonic import infer_model

# Set provider via environment
provider = os.getenv("LLM_PROVIDER", "openai")
model_name = os.getenv("LLM_MODEL", "gpt-4o")

# Switch providers without code changes
model = infer_model(f"{provider}/{model_name}")

Example .env file:

# Switch to DeepSeek
LLM_PROVIDER=deepseek
LLM_MODEL=deepseek-chat

# Or Cerebras
LLM_PROVIDER=cerebras
LLM_MODEL=llama-3.3-70b

With Streaming

from upsonic import Agent, Task, infer_model

model = infer_model("deepseek/deepseek-chat")
agent = Agent(model=model)

task = Task("Write a long story")

# Streaming works identically across providers
async with agent.stream(task) as result:
    async for text in result.stream_output():
        print(text, end='', flush=True)

With Tools

from upsonic import Agent, Task, infer_model

def calculate(expression: str) -> float:
    """Evaluate a mathematical expression."""
    return eval(expression)

# Tools work with OpenAI-compatible models
model = infer_model("cerebras/llama-3.3-70b")
agent = Agent(model=model, tools=[calculate])

task = Task("What is 456 * 789?")
result = agent.do(task)

With Structured Output

from pydantic import BaseModel
from upsonic import infer_model

class Analysis(BaseModel):
    sentiment: str
    confidence: float
    summary: str

model = infer_model("deepseek/deepseek-chat")
model = model.with_structured_output(Analysis)

result = await model.ainvoke("Analyze this: Great product!")
print(result.sentiment, result.confidence)

Fallback Pattern

from upsonic import Agent, Task, infer_model
from upsonic.utils.package.exception import ModelHTTPError

async def request_with_fallback(prompt: str):
    """Try multiple providers in order."""
    providers = [
        ("openai", "gpt-4o"),
        ("deepseek", "deepseek-chat"),
        ("cerebras", "llama-3.3-70b"),
    ]
    
    for provider, model_name in providers:
        try:
            model = infer_model(f"{provider}/{model_name}")
            agent = Agent(model=model)
            return agent.do(Task(prompt))
        except ModelHTTPError:
            continue
    
    raise Exception("All providers failed")

Params

Base Model Parameters

All OpenAI-compatible models support these standard parameters:

Parameter	Type	Description	Default	Support
`max_tokens`	`int`	Maximum tokens to generate	Varies	All
`temperature`	`float`	Sampling temperature (0.0-2.0)	1.0	All
`top_p`	`float`	Nucleus sampling threshold	1.0	All
`seed`	`int`	Random seed for reproducibility	None	Most
`stop_sequences`	`list[str]`	Sequences that stop generation	None	All
`presence_penalty`	`float`	Penalize token presence (-2.0 to 2.0)	0.0	Most
`frequency_penalty`	`float`	Penalize token frequency (-2.0 to 2.0)	0.0	Most
`logit_bias`	`dict[str, int]`	Modify token likelihoods	None	Some
`parallel_tool_calls`	`bool`	Allow parallel tool execution	True	Most
`timeout`	`float`	Request timeout in seconds	600	All

Provider-Specific Parameters

Some providers extend the OpenAI spec with additional parameters:

Provider	Additional Parameters	Notes
DeepSeek	None	Standard OpenAI params only
Cerebras	None	Standard OpenAI params only
Fireworks	`context_length_exceeded_behavior`	How to handle context overflow
Together AI	`repetition_penalty`	Additional control over repetition
Azure	`azure_deployment`	Deployment name in Azure
Ollama	`num_predict`, `num_ctx`	Ollama-specific controls

Example: Full Configuration

from upsonic.models.openai import OpenAIChatModel, OpenAIChatModelSettings

settings = OpenAIChatModelSettings(
    # Token control
    max_tokens=4096,
    
    # Sampling parameters
    temperature=0.7,
    top_p=0.9,
    seed=42,
    
    # Penalties
    presence_penalty=0.1,
    frequency_penalty=0.1,
    
    # Stop sequences
    stop_sequences=["END", "STOP"],
    
    # Tool settings
    parallel_tool_calls=True,
    
    # Request settings
    timeout=120.0,
    
    # Custom headers
    extra_headers={"X-Custom": "value"}
)

model = OpenAIChatModel(
    model_name="deepseek-chat",
    provider="deepseek",
    settings=settings
)

Parameter Comparison Table

Support across major OpenAI-compatible providers:

Parameter	OpenAI	DeepSeek	Cerebras	Fireworks	Together	Azure	Ollama
`max_tokens`	✅	✅	✅	✅	✅	✅	✅
`temperature`	✅	✅	✅	✅	✅	✅	✅
`top_p`	✅	✅	✅	✅	✅	✅	✅
`seed`	✅	✅	✅	✅	✅	✅	✅
`stop_sequences`	✅	✅	✅	✅	✅	✅	✅
`presence_penalty`	✅	✅	✅	✅	✅	✅	❌
`frequency_penalty`	✅	✅	✅	✅	✅	✅	❌
`logit_bias`	✅	✅	❌	❌	❌	✅	❌
`parallel_tool_calls`	✅	✅	✅	✅	✅	✅	❌
`stream`	✅	✅	✅	✅	✅	✅	✅

Temperature Guidelines

Temperature	Behavior	Best For
0.0 - 0.3	Very focused	Code, math, factual
0.4 - 0.7	Balanced	General purpose
0.8 - 1.0	Creative	Stories, brainstorming
1.1 - 2.0	Very random	Experimental

GET STARTED

UPSONIC 101 GUIDE

CONCEPTS

DEPLOYMENT

FURTHER READINGS

OpenAI-Compatible Models

What are OpenAI-Compatible Models?

Key Characteristics

Supported Providers in Upsonic

Usage

Basic Usage with infer_model

Switching Between Providers (All are openai compatible)

Manual Configuration

Using Custom Base URL

Environment-Based Configuration

With Streaming

With Tools

With Structured Output

Fallback Pattern

Params

Base Model Parameters

Provider-Specific Parameters

Example: Full Configuration

Parameter Comparison Table

Temperature Guidelines

GET STARTED

UPSONIC 101 GUIDE

CONCEPTS

DEPLOYMENT

FURTHER READINGS

​What are OpenAI-Compatible Models?

​Key Characteristics

​Supported Providers in Upsonic

​Usage

​Basic Usage with infer_model

​Switching Between Providers (All are openai compatible)

​Manual Configuration

​Using Custom Base URL

​Environment-Based Configuration

​With Streaming

​With Tools

​With Structured Output

​Fallback Pattern

​Params

​Base Model Parameters

​Provider-Specific Parameters

​Example: Full Configuration

​Parameter Comparison Table

​Temperature Guidelines

​Related Resources

What are OpenAI-Compatible Models?

Key Characteristics

Supported Providers in Upsonic

Usage

Basic Usage with infer_model

Switching Between Providers (All are openai compatible)

Manual Configuration

Using Custom Base URL

Environment-Based Configuration

With Streaming

With Tools

With Structured Output

Fallback Pattern

Params

Base Model Parameters

Provider-Specific Parameters

Example: Full Configuration

Parameter Comparison Table

Temperature Guidelines

Related Resources