Skip to main content

Overview

Outlines provides structured generation and control for models run locally or via SGLang, using the Outlines library. There is no API key or base URL; you build an OutlinesModel from a concrete backend (Transformers, LlamaCpp, MLXLM, SGLang, or vLLM offline). Tool calls are not supported; JSON schema and JSON object output are supported. Model Class: OutlinesModel

Authentication

No API key or environment variables are required. For SGLang, configure base_url and optional api_key in OutlinesModel.from_sglang().

Examples

From Transformers (local):
from upsonic import Agent, Task
from upsonic.models.outlines import OutlinesModel
from transformers import AutoModelForCausalLM, AutoTokenizer

hf_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B-Instruct")
model = OutlinesModel.from_transformers(hf_model, tokenizer)
agent = Agent(model=model)

task = Task("Hello, how are you?")
result = agent.do(task)
print(result)
From SGLang (remote server):
from upsonic import Agent, Task
from upsonic.models.outlines import OutlinesModel

model = OutlinesModel.from_sglang("http://localhost:30000", model_name="meta-llama/Llama-3.2-1B-Instruct")
agent = Agent(model=model)

task = Task("Hello, how are you?")
result = agent.do(task)
print(result)

Model Settings

You can set model parameters on the model or on the Agent. Supported parameters depend on the backend (Transformers, LlamaCpp, SGLang, vLLMOffline). On the model:
from upsonic import Agent, Task
from upsonic.models.outlines import OutlinesModel
from upsonic.models.settings import ModelSettings

model = OutlinesModel.from_sglang(
    "http://localhost:30000",
    settings=ModelSettings(max_tokens=1024, temperature=0.7)
)
agent = Agent(model=model)
On the Agent:
from upsonic import Agent, Task
from upsonic.models.settings import ModelSettings

agent = Agent(
    model=model,  # OutlinesModel instance required; no provider/model string
    settings=ModelSettings(max_tokens=1024, temperature=0.7)
)

Parameters

Supported settings vary by backend. Base options:
ParameterTypeDescriptionDefaultBackends
max_tokensintMaximum tokens to generateModel defaultTransformers, LlamaCpp, SGLang, vLLMOffline
temperaturefloatSampling temperature1.0Transformers, LlamaCpp, SGLang, vLLMOffline
top_pfloatNucleus sampling1.0Transformers, LlamaCpp, SGLang, vLLMOffline
seedintRandom seedNoneLlamaCpp, vLLMOffline
presence_penaltyfloatToken presence penalty0.0LlamaCpp, SGLang, vLLMOffline
frequency_penaltyfloatToken frequency penalty0.0LlamaCpp, SGLang, vLLMOffline
logit_biasdict[str, int]Logit bias per tokenNoneTransformers, LlamaCpp, vLLMOffline