What are OpenAI-Compatible Models?
OpenAI-compatible models are LLM providers and services that implement the OpenAI API specification. This means they expose the same REST API endpoints, request/response formats, and SDK compatibility as OpenAI, allowing you to use them as drop-in replacements for OpenAI models with minimal code changes.Key Characteristics
API Compatibility:- Same endpoint structure (
/v1/chat/completions,/v1/completions) - Identical request/response JSON format
- Compatible with OpenAI Python/JavaScript SDKs
- Standard authentication via API keys
- All use
OpenAIChatModelin Upsonic - Import from
upsonic.models.openai - Configure via provider parameter or base_url
- Easy Migration: Switch providers without rewriting code
- Vendor Independence: Not locked into single provider
- Cost Optimization: Choose best price/performance ratio
- Redundancy: Fallback between providers
- Feature Access: Use latest open-source models
Supported Providers in Upsonic
| Provider | Base URL | Best For |
|---|---|---|
| DeepSeek | https://api.deepseek.com | Cost-effective reasoning |
| Cerebras | https://api.cerebras.ai/v1 | Ultra-fast inference |
| Fireworks | https://api.fireworks.ai/inference/v1 | Open model access |
| GitHub Models | https://models.inference.ai.azure.com | Developer testing |
| Together AI | https://api.together.xyz | Collaborative serving |
| Azure OpenAI | https://{resource}.openai.azure.com | Enterprise deployment |
| Ollama | http://localhost:11434/v1 | Local inference |
| Grok | https://api.x.ai/v1 | Real-time information |
| Vercel AI | Various | Edge-optimized |
| Heroku | Various | Cloud platform |
Usage
Basic Usage with infer_model
The simplest way to use OpenAI-compatible models is withinfer_model:
Switching Between Providers (All are openai compatible)
Manual Configuration
For more control, instantiate models directly:Using Custom Base URL
For self-hosted or custom endpoints:Environment-Based Configuration
.env file:
With Streaming
With Tools
With Structured Output
Fallback Pattern
Params
Base Model Parameters
All OpenAI-compatible models support these standard parameters:| Parameter | Type | Description | Default | Support |
|---|---|---|---|---|
max_tokens | int | Maximum tokens to generate | Varies | All |
temperature | float | Sampling temperature (0.0-2.0) | 1.0 | All |
top_p | float | Nucleus sampling threshold | 1.0 | All |
seed | int | Random seed for reproducibility | None | Most |
stop_sequences | list[str] | Sequences that stop generation | None | All |
presence_penalty | float | Penalize token presence (-2.0 to 2.0) | 0.0 | Most |
frequency_penalty | float | Penalize token frequency (-2.0 to 2.0) | 0.0 | Most |
logit_bias | dict[str, int] | Modify token likelihoods | None | Some |
parallel_tool_calls | bool | Allow parallel tool execution | True | Most |
timeout | float | Request timeout in seconds | 600 | All |
Provider-Specific Parameters
Some providers extend the OpenAI spec with additional parameters:| Provider | Additional Parameters | Notes |
|---|---|---|
| DeepSeek | None | Standard OpenAI params only |
| Cerebras | None | Standard OpenAI params only |
| Fireworks | context_length_exceeded_behavior | How to handle context overflow |
| Together AI | repetition_penalty | Additional control over repetition |
| Azure | azure_deployment | Deployment name in Azure |
| Ollama | num_predict, num_ctx | Ollama-specific controls |
Example: Full Configuration
Parameter Comparison Table
Support across major OpenAI-compatible providers:| Parameter | OpenAI | DeepSeek | Cerebras | Fireworks | Together | Azure | Ollama |
|---|---|---|---|---|---|---|---|
max_tokens | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
temperature | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
top_p | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
seed | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
stop_sequences | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
presence_penalty | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
frequency_penalty | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
logit_bias | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ |
parallel_tool_calls | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
stream | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Temperature Guidelines
| Temperature | Behavior | Best For |
|---|---|---|
| 0.0 - 0.3 | Very focused | Code, math, factual |
| 0.4 - 0.7 | Balanced | General purpose |
| 0.8 - 1.0 | Creative | Stories, brainstorming |
| 1.1 - 2.0 | Very random | Experimental |

