Skip to main content

What are LLM Models?

Large Language Models (LLMs) are the foundation of the Upsonic AI Agent Framework. The framework provides a unified interface to interact with various LLM providers, allowing you to build AI agents that can leverage different models without changing your code structure.

Model Architecture

In Upsonic, all model classes inherit from the base Model class, which provides:
  • Unified Interface: Consistent API across all providers
  • LCEL Integration: Models implement the Runnable interface for chain composition
  • Streaming Support: Real-time response streaming for better UX
  • Tool Calling: Native function calling capabilities
  • Structured Output: Type-safe responses using Pydantic models
  • Memory Management: Built-in conversation history support

Key Components

1. Model Settings

Model settings control the behavior of LLM requests:
from upsonic.models.settings import ModelSettings

settings = ModelSettings(
    max_tokens=2048,
    temperature=0.7,
    top_p=0.9,
    seed=42
)
All settings are optional and provider-specific settings are prefixed with the provider name (e.g., openai_, anthropic_, google_).

2. Model Profiles

Profiles define model capabilities and behaviors:
from upsonic.profiles import ModelProfile

profile = ModelProfile(
    supports_tools=True,
    supports_json_schema_output=True,
    default_structured_output_mode='native'
)

3. Model Inference

Use infer_model() to automatically select the appropriate model class:
from upsonic import infer_model

# Automatic provider detection
model = infer_model("openai/gpt-4o")
model = infer_model("anthropic/claude-3-5-sonnet-20241022")
model = infer_model("google-gla/gemini-2.5-flash")

Usage Patterns

Basic Usage

from upsonic import Agent, Task, infer_model

model = infer_model("openai/gpt-4o")
agent = Agent(model=model)

task = Task("Explain quantum computing")
result = agent.do(task)

With Custom Settings

from upsonic.models.openai import OpenAIChatModel, OpenAIChatModelSettings

settings = OpenAIChatModelSettings(
    max_tokens=1024,
    temperature=0.5,
    openai_reasoning_effort="high"
)

model = OpenAIChatModel(
    model_name="gpt-4o",
    settings=settings
)

agent = Agent(model=model)

LCEL Chains

from upsonic.lcel import ChatPromptTemplate
from upsonic import infer_model

prompt = ChatPromptTemplate.from_template("Tell me about {topic}")
model = infer_model("openai/gpt-4o")

# Chain composition with pipe operator
chain = prompt | model
result = await chain.invoke({"topic": "AI"})

Error Handling

The framework provides comprehensive error handling for LLM operations:

Common Exceptions

ModelHTTPError

Raised when an HTTP error occurs during model requests:
from upsonic.utils.package.exception import ModelHTTPError

try:
    result = agent.do(task)
except ModelHTTPError as e:
    print(f"Status Code: {e.status_code}")
    print(f"Model: {e.model_name}")
    print(f"Body: {e.body}")

UserError

Raised for user-facing configuration or usage errors:
from upsonic.utils.package.exception import UserError

try:
    model = infer_model("unknown/model")
except UserError as e:
    print(f"Error: {e}")

UnexpectedModelBehavior

Raised when a model responds in an unexpected way:
from upsonic.utils.package.exception import UnexpectedModelBehavior

try:
    response = await model.request_stream(messages, settings, params)
except UnexpectedModelBehavior as e:
    print(f"Unexpected behavior: {e}")

Handling Rate Limits

import asyncio
from upsonic.utils.package.exception import ModelHTTPError

async def request_with_retry(agent, task, max_retries=3):
    for attempt in range(max_retries):
        try:
            return agent.do(task)
        except ModelHTTPError as e:
            if e.status_code == 429:  # Rate limit
                wait_time = 2 ** attempt  # Exponential backoff
                print(f"Rate limited. Waiting {wait_time}s...")
                await asyncio.sleep(wait_time)
            else:
                raise
    raise Exception("Max retries exceeded")

Handling Token Limits

from upsonic.models.settings import ModelSettings

settings = ModelSettings(
    max_tokens=4096,
    # Will raise error if input + output exceeds context window
)

try:
    model = infer_model("openai/gpt-4o")
    agent = Agent(model=model, settings=settings)
    result = agent.do(very_long_task)
except ModelHTTPError as e:
    if "context_length_exceeded" in str(e.body):
        print("Input too long, consider truncating")

Handling Invalid Responses

from pydantic import BaseModel, ValidationError

class ResponseFormat(BaseModel):
    answer: str
    confidence: float

try:
    model = infer_model("openai/gpt-4o")
    model = model.with_structured_output(ResponseFormat)
    result = await model.ainvoke("What is AI?")
except ValidationError as e:
    print(f"Invalid response format: {e}")

Global Error Handling

Disable model requests globally for testing:
from upsonic.models import override_allow_model_requests

# Disable requests (useful for testing)
with override_allow_model_requests(False):
    try:
        result = agent.do(task)
    except RuntimeError as e:
        print("Model requests are disabled")

Best Practices

  1. Always Use Environment Variables: Store API keys in environment variables, never hardcode them
  2. Implement Retry Logic: Network errors and rate limits are common, implement exponential backoff
  3. Monitor Token Usage: Track usage to avoid unexpected costs
  4. Handle Timeouts: Set appropriate timeouts based on your use case
  5. Validate Outputs: Use structured output with Pydantic models for type safety
  6. Log Errors: Implement comprehensive logging for debugging
  7. Use Streaming: For better UX, use streaming responses when available
  8. Test Error Paths: Write tests that cover error scenarios

Next Steps