Skip to main content

What are OpenAI-Compatible Models?

OpenAI-compatible models are LLM providers and services that implement the OpenAI API specification. This means they expose the same REST API endpoints, request/response formats, and SDK compatibility as OpenAI, allowing you to use them as drop-in replacements for OpenAI models with minimal code changes.

Key Characteristics

API Compatibility:
  • Same endpoint structure (/v1/chat/completions, /v1/completions)
  • Identical request/response JSON format
  • Compatible with OpenAI Python/JavaScript SDKs
  • Standard authentication via API keys
Model Classes:
  • All use OpenAIChatModel in Upsonic
  • Import from upsonic.models.openai
  • Configure via provider parameter or base_url
Benefits:
  • Easy Migration: Switch providers without rewriting code
  • Vendor Independence: Not locked into single provider
  • Cost Optimization: Choose best price/performance ratio
  • Redundancy: Fallback between providers
  • Feature Access: Use latest open-source models

Supported Providers in Upsonic

ProviderBase URLBest For
DeepSeekhttps://api.deepseek.comCost-effective reasoning
Cerebrashttps://api.cerebras.ai/v1Ultra-fast inference
Fireworkshttps://api.fireworks.ai/inference/v1Open model access
GitHub Modelshttps://models.inference.ai.azure.comDeveloper testing
Together AIhttps://api.together.xyzCollaborative serving
Azure OpenAIhttps://{resource}.openai.azure.comEnterprise deployment
Ollamahttp://localhost:11434/v1Local inference
Grokhttps://api.x.ai/v1Real-time information
Vercel AIVariousEdge-optimized
HerokuVariousCloud platform
Note: Some providers like Groq have dedicated model classes but also support OpenAI-compatible API.

Usage

Basic Usage with infer_model

The simplest way to use OpenAI-compatible models is with infer_model:
from upsonic import Agent, Task, infer_model

# Using provider prefix (recommended)
model = infer_model("deepseek/deepseek-chat")
agent = Agent(model=model)

task = Task("Explain quantum computing")
result = agent.do(task)

Switching Between Providers (All are openai compatible)

from upsonic import infer_model

# OpenAI
model_openai = infer_model("openai/gpt-4o")

# DeepSeek - same interface
model_deepseek = infer_model("deepseek/deepseek-chat")

# Cerebras - same interface
model_cerebras = infer_model("cerebras/llama-3.3-70b")

# All work identically
agent = Agent(model=model_deepseek)
result = agent.do(Task("Hello"))

Manual Configuration

For more control, instantiate models directly:
from upsonic.models.openai import OpenAIChatModel, OpenAIChatModelSettings

settings = OpenAIChatModelSettings(
    max_tokens=2048,
    temperature=0.7
)

# Using provider name
model = OpenAIChatModel(
    model_name="deepseek-chat",
    provider="deepseek",
    settings=settings
)

Using Custom Base URL

For self-hosted or custom endpoints:
from upsonic.models.openai import OpenAIChatModel
from openai import AsyncOpenAI

# Custom OpenAI-compatible endpoint
client = AsyncOpenAI(
    base_url="https://your-custom-endpoint.com/v1",
    api_key="your-api-key"
)

from upsonic.providers.openai import OpenAIProvider
provider = OpenAIProvider(client=client)

model = OpenAIChatModel(
    model_name="your-model-name",
    provider=provider
)

Environment-Based Configuration

import os
from upsonic import infer_model

# Set provider via environment
provider = os.getenv("LLM_PROVIDER", "openai")
model_name = os.getenv("LLM_MODEL", "gpt-4o")

# Switch providers without code changes
model = infer_model(f"{provider}/{model_name}")
Example .env file:
# Switch to DeepSeek
LLM_PROVIDER=deepseek
LLM_MODEL=deepseek-chat

# Or Cerebras
LLM_PROVIDER=cerebras
LLM_MODEL=llama-3.3-70b

With Streaming

from upsonic import Agent, Task, infer_model

model = infer_model("deepseek/deepseek-chat")
agent = Agent(model=model)

task = Task("Write a long story")

# Streaming works identically across providers
async with agent.stream(task) as result:
    async for text in result.stream_output():
        print(text, end='', flush=True)

With Tools

from upsonic import Agent, Task, infer_model

def calculate(expression: str) -> float:
    """Evaluate a mathematical expression."""
    return eval(expression)

# Tools work with OpenAI-compatible models
model = infer_model("cerebras/llama-3.3-70b")
agent = Agent(model=model, tools=[calculate])

task = Task("What is 456 * 789?")
result = agent.do(task)

With Structured Output

from pydantic import BaseModel
from upsonic import infer_model

class Analysis(BaseModel):
    sentiment: str
    confidence: float
    summary: str

model = infer_model("deepseek/deepseek-chat")
model = model.with_structured_output(Analysis)

result = await model.ainvoke("Analyze this: Great product!")
print(result.sentiment, result.confidence)

Fallback Pattern

from upsonic import Agent, Task, infer_model
from upsonic.utils.package.exception import ModelHTTPError

async def request_with_fallback(prompt: str):
    """Try multiple providers in order."""
    providers = [
        ("openai", "gpt-4o"),
        ("deepseek", "deepseek-chat"),
        ("cerebras", "llama-3.3-70b"),
    ]
    
    for provider, model_name in providers:
        try:
            model = infer_model(f"{provider}/{model_name}")
            agent = Agent(model=model)
            return agent.do(Task(prompt))
        except ModelHTTPError:
            continue
    
    raise Exception("All providers failed")

Params

Base Model Parameters

All OpenAI-compatible models support these standard parameters:
ParameterTypeDescriptionDefaultSupport
max_tokensintMaximum tokens to generateVariesAll
temperaturefloatSampling temperature (0.0-2.0)1.0All
top_pfloatNucleus sampling threshold1.0All
seedintRandom seed for reproducibilityNoneMost
stop_sequenceslist[str]Sequences that stop generationNoneAll
presence_penaltyfloatPenalize token presence (-2.0 to 2.0)0.0Most
frequency_penaltyfloatPenalize token frequency (-2.0 to 2.0)0.0Most
logit_biasdict[str, int]Modify token likelihoodsNoneSome
parallel_tool_callsboolAllow parallel tool executionTrueMost
timeoutfloatRequest timeout in seconds600All

Provider-Specific Parameters

Some providers extend the OpenAI spec with additional parameters:
ProviderAdditional ParametersNotes
DeepSeekNoneStandard OpenAI params only
CerebrasNoneStandard OpenAI params only
Fireworkscontext_length_exceeded_behaviorHow to handle context overflow
Together AIrepetition_penaltyAdditional control over repetition
Azureazure_deploymentDeployment name in Azure
Ollamanum_predict, num_ctxOllama-specific controls

Example: Full Configuration

from upsonic.models.openai import OpenAIChatModel, OpenAIChatModelSettings

settings = OpenAIChatModelSettings(
    # Token control
    max_tokens=4096,
    
    # Sampling parameters
    temperature=0.7,
    top_p=0.9,
    seed=42,
    
    # Penalties
    presence_penalty=0.1,
    frequency_penalty=0.1,
    
    # Stop sequences
    stop_sequences=["END", "STOP"],
    
    # Tool settings
    parallel_tool_calls=True,
    
    # Request settings
    timeout=120.0,
    
    # Custom headers
    extra_headers={"X-Custom": "value"}
)

model = OpenAIChatModel(
    model_name="deepseek-chat",
    provider="deepseek",
    settings=settings
)

Parameter Comparison Table

Support across major OpenAI-compatible providers:
ParameterOpenAIDeepSeekCerebrasFireworksTogetherAzureOllama
max_tokens
temperature
top_p
seed
stop_sequences
presence_penalty
frequency_penalty
logit_bias
parallel_tool_calls
stream

Temperature Guidelines

TemperatureBehaviorBest For
0.0 - 0.3Very focusedCode, math, factual
0.4 - 0.7BalancedGeneral purpose
0.8 - 1.0CreativeStories, brainstorming
1.1 - 2.0Very randomExperimental