Skip to main content

Overview

In the Upsonic framework, Embeddings are the foundation for converting text into numerical vector representations that capture semantic meaning. The framework provides a unified interface across multiple embedding providers, enabling seamless switching between local and cloud-based models while maintaining consistent behavior and advanced features like caching, batching, and automatic retry mechanisms.

Embedding Providers

Upsonic supports multiple embedding providers with a consistent API:
ProviderTypeBest ForPricing
OpenAICloud APIProduction deployments, high quality0.020.02-0.13 per 1M tokens
Azure OpenAICloud APIEnterprise, compliance requirementsVariable by region
AWS BedrockCloud APIAWS infrastructure, multi-model0.00010.0001-0.0007 per 1K tokens
Google GeminiCloud APIMultilingual, code embeddings$0.15 per 1M tokens
HuggingFaceLocal/APICustom models, flexibilityFree (local) or variable (API)
FastEmbedLocalFast inference, no API costsFree
OllamaLocalPrivacy, offline operationFree

Base Configuration

All embedding providers share common configuration options:
AttributeTypeDescriptionDefault
model_namestrModel identifierProvider-specific
batch_sizeintBatch size for processing100
max_retriesintMaximum retry attempts3
retry_delayfloatInitial retry delay (seconds)1.0
timeoutfloatRequest timeout (seconds)30.0
normalize_embeddingsboolNormalize to unit lengthTrue
show_progressboolDisplay progress during batch opsTrue
cache_embeddingsboolEnable embedding cachingFalse
enable_retry_with_backoffboolExponential backoff on retriesTrue
enable_adaptive_batchingboolDynamic batch size adjustmentTrue
enable_compressionboolEnable dimensionality reductionFalse

OpenAI Embeddings

Basic Usage

from upsonic.embeddings import OpenAIEmbedding

# Initialize provider
embedding = OpenAIEmbedding(
    model_name="text-embedding-3-small",
    api_key="your-api-key"  # or set OPENAI_API_KEY env var
)

# Embed documents
from upsonic.schemas.data_models import Chunk

chunks = [
    Chunk(text_content="Artificial intelligence is transforming technology."),
    Chunk(text_content="Machine learning enables computers to learn from data.")
]

embeddings = await embedding.embed_documents(chunks)

# Embed a query
query_embedding = await embedding.embed_query("What is AI?")

Advanced Configuration

from upsonic.embeddings import OpenAIEmbeddingConfig, OpenAIEmbedding

config = OpenAIEmbeddingConfig(
    model_name="text-embedding-3-large",
    batch_size=50,
    enable_rate_limiting=True,
    requests_per_minute=3000,
    tokens_per_minute=1000000,
    parallel_requests=5,
    cache_embeddings=True,
    normalize_embeddings=True
)

embedding = OpenAIEmbedding(config=config)

Model Options

# Small model (fastest, most cost-effective)
embedding = OpenAIEmbedding(model_name="text-embedding-3-small")  # 1536 dims

# Large model (highest quality)
embedding = OpenAIEmbedding(model_name="text-embedding-3-large")  # 3072 dims

# Legacy model
embedding = OpenAIEmbedding(model_name="text-embedding-ada-002")  # 1536 dims

Azure OpenAI Embeddings

Basic Usage

from upsonic.embeddings import AzureOpenAIEmbedding

embedding = AzureOpenAIEmbedding(
    azure_endpoint="https://your-resource.openai.azure.com/",
    deployment_name="your-deployment-name",
    api_key="your-api-key",
    api_version="2024-02-01"
)

Managed Identity Authentication

from upsonic.embeddings import create_azure_embedding_with_managed_identity

# Use Azure Managed Identity (no API key needed)
embedding = create_azure_embedding_with_managed_identity(
    azure_endpoint="https://your-resource.openai.azure.com/",
    deployment_name="your-deployment-name",
    client_id="your-client-id"  # Optional
)

Enterprise Features

config = AzureOpenAIEmbeddingConfig(
    azure_endpoint="https://your-resource.openai.azure.com/",
    deployment_name="embedding-deployment",
    enable_content_filtering=True,
    data_residency_region="eastus",
    use_managed_identity=True,
    tenant_id="your-tenant-id"
)

embedding = AzureOpenAIEmbedding(config=config)

# Get compliance information
compliance = embedding.get_compliance_info()

AWS Bedrock Embeddings

Basic Usage

from upsonic.embeddings import BedrockEmbedding

# Titan embeddings
embedding = BedrockEmbedding(
    model_name="amazon.titan-embed-text-v2",
    region_name="us-east-1"
)

# Cohere embeddings
embedding = BedrockEmbedding(
    model_name="cohere.embed-multilingual-v3",
    region_name="us-east-1"
)

AWS Credentials

# Method 1: Explicit credentials
embedding = BedrockEmbedding(
    model_name="amazon.titan-embed-text-v1",
    aws_access_key_id="your-access-key",
    aws_secret_access_key="your-secret-key",
    region_name="us-east-1"
)

# Method 2: AWS profile
embedding = BedrockEmbedding(
    model_name="amazon.titan-embed-text-v1",
    profile_name="your-profile",
    region_name="us-east-1"
)

# Method 3: IAM roles (recommended)
embedding = BedrockEmbedding(
    model_name="amazon.titan-embed-text-v1",
    region_name="us-east-1"
)

Model Options

from upsonic.embeddings import create_titan_embedding, create_cohere_embedding

# Titan v1 (1536 dims)
embedding = create_titan_embedding(region_name="us-east-1", model_version="v1")

# Titan v2 (1024 dims, optimized)
embedding = create_titan_embedding(region_name="us-east-1", model_version="v2")

# Cohere English (1024 dims)
embedding = create_cohere_embedding(language="english", region_name="us-east-1")

# Cohere Multilingual (1024 dims)
embedding = create_cohere_embedding(language="multilingual", region_name="us-east-1")

Google Gemini Embeddings

Basic Usage

from upsonic.embeddings import GeminiEmbedding

# Using API key
embedding = GeminiEmbedding(
    model_name="gemini-embedding-001",
    api_key="your-api-key",
    task_type="RETRIEVAL_DOCUMENT"
)

Task-Specific Embeddings

from upsonic.embeddings import (
    create_gemini_document_embedding,
    create_gemini_query_embedding,
    create_gemini_semantic_embedding
)

# Document embeddings
doc_embedding = create_gemini_document_embedding(api_key="your-api-key")

# Query embeddings
query_embedding = create_gemini_query_embedding(api_key="your-api-key")

# Semantic similarity
semantic_embedding = create_gemini_semantic_embedding(api_key="your-api-key")

Vertex AI Integration

from upsonic.embeddings import create_gemini_vertex_embedding

# Using Vertex AI
embedding = create_gemini_vertex_embedding(
    project_id="your-gcp-project",
    location="us-central1",
    model_name="gemini-embedding-001"
)

Advanced Configuration

config = GeminiEmbeddingConfig(
    model_name="gemini-embedding-001",
    api_key="your-api-key",
    task_type="RETRIEVAL_DOCUMENT",
    output_dimensionality=768,  # 128-3072 (Matryoshka)
    enable_batch_processing=True,
    enable_caching=True,
    cache_ttl_seconds=3600,
    requests_per_minute=60
)

embedding = GeminiEmbedding(config=config)

HuggingFace Embeddings

Local Model Execution

from upsonic.embeddings import HuggingFaceEmbedding

# Sentence Transformers
embedding = HuggingFaceEmbedding(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    device="cuda",  # or "cpu", "mps"
    torch_dtype="float32"
)

# MPNet (high quality)
from upsonic.embeddings import create_mpnet_embedding
embedding = create_mpnet_embedding(device="cuda")

# MiniLM (fast and efficient)
from upsonic.embeddings import create_minilm_embedding
embedding = create_minilm_embedding(device="cpu")

Quantization for Efficiency

config = HuggingFaceEmbeddingConfig(
    model_name="sentence-transformers/all-mpnet-base-v2",
    enable_quantization=True,
    quantization_bits=8,  # 4, 8, or 16
    enable_gradient_checkpointing=True,
    pooling_strategy="mean",
    normalize_embeddings=True
)

embedding = HuggingFaceEmbedding(config=config)

HuggingFace API

from upsonic.embeddings import create_huggingface_api_embedding

# Use HuggingFace Inference API
embedding = create_huggingface_api_embedding(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    hf_token="your-hf-token"
)

FastEmbed (Qdrant)

Basic Usage

from upsonic.embeddings import FastEmbedProvider

# Default: BGE-small (fast and efficient)
embedding = FastEmbedProvider(
    model_name="BAAI/bge-small-en-v1.5"
)

# BGE-large (high quality)
from upsonic.embeddings import create_bge_large_embedding
embedding = create_bge_large_embedding()

# E5 (multilingual)
from upsonic.embeddings import create_e5_embedding
embedding = create_e5_embedding()

GPU Acceleration

from upsonic.embeddings import create_gpu_accelerated_embedding

embedding = create_gpu_accelerated_embedding(
    model_name="BAAI/bge-large-en-v1.5",
    enable_gpu=True,
    providers=["CUDAExecutionProvider", "CPUExecutionProvider"]
)

Sparse Embeddings

from upsonic.embeddings import create_sparse_embedding

# Sparse embeddings for efficiency
embedding = create_sparse_embedding(
    model_name="prithivida/Splade_PP_en_v1",
    enable_sparse_embeddings=True
)

Advanced Configuration

config = FastEmbedConfig(
    model_name="BAAI/bge-small-en-v1.5",
    cache_dir="/path/to/cache",
    threads=4,
    enable_gpu=True,
    enable_parallel_processing=True,
    doc_embed_type="passage",
    model_warmup=True
)

embedding = FastEmbedProvider(config=config)

Ollama Embeddings

Basic Usage

from upsonic.embeddings import OllamaEmbedding

# Nomic Embed Text (default)
embedding = OllamaEmbedding(
    model_name="nomic-embed-text",
    base_url="http://localhost:11434"
)

# Automatic model pulling
embedding = OllamaEmbedding(
    model_name="mxbai-embed-large",
    auto_pull_model=True,
    enable_model_preload=True
)
from upsonic.embeddings import (
    create_nomic_embedding,
    create_mxbai_embedding,
    create_arctic_embedding
)

# Nomic Embed Text (768 dims)
embedding = create_nomic_embedding(base_url="http://localhost:11434")

# MXBAI Large (1024 dims)
embedding = create_mxbai_embedding(base_url="http://localhost:11434")

# Snowflake Arctic (1024 dims)
embedding = create_arctic_embedding(base_url="http://localhost:11434")

Custom Server Configuration

config = OllamaEmbeddingConfig(
    model_name="nomic-embed-text",
    base_url="http://your-server:11434",
    auto_pull_model=True,
    keep_alive="5m",
    request_timeout=120.0,
    connection_timeout=10.0,
    num_ctx=2048  # Context window size
)

embedding = OllamaEmbedding(config=config)

Embedding Modes

Different providers support specific embedding modes for optimization:
from upsonic.embeddings import EmbeddingMode

# Document mode (for indexing)
embeddings = await embedding.embed_texts(
    texts=["Document text"],
    mode=EmbeddingMode.DOCUMENT
)

# Query mode (for search)
embeddings = await embedding.embed_texts(
    texts=["Search query"],
    mode=EmbeddingMode.QUERY
)

# Symmetric mode (general purpose)
embeddings = await embedding.embed_texts(
    texts=["Any text"],
    mode=EmbeddingMode.SYMMETRIC
)

# Clustering mode
embeddings = await embedding.embed_texts(
    texts=["Text for clustering"],
    mode=EmbeddingMode.CLUSTERING
)

Advanced Features

Caching

# Enable caching for repeated embeddings
config = OpenAIEmbeddingConfig(
    model_name="text-embedding-3-small",
    cache_embeddings=True
)

embedding = OpenAIEmbedding(config=config)

# Check cache statistics
cache_info = embedding.get_cache_info()
print(f"Cache size: {cache_info['size']} embeddings")

# Clear cache when needed
embedding.clear_cache()

Progress Tracking

# Batch process with progress display
texts = ["Text 1", "Text 2", "Text 3", ...]  # Large list

embeddings = await embedding.embed_texts(
    texts=texts,
    show_progress=True
)

Error Handling and Retries

# Configure retry behavior
config = OpenAIEmbeddingConfig(
    model_name="text-embedding-3-small",
    max_retries=5,
    retry_delay=2.0,
    enable_retry_with_backoff=True,
    enable_adaptive_batching=True,
    timeout=60.0
)

embedding = OpenAIEmbedding(config=config)

Validation and Testing

# Validate provider connection
is_valid = await embedding.validate_connection()
if is_valid:
    print("Embedding provider is ready")

# Get model information
model_info = embedding.get_model_info()
print(f"Model: {model_info['model_name']}")
print(f"Dimensions: {model_info['dimensions']}")

# Get metrics
metrics = embedding.get_metrics()
print(f"Total chunks processed: {metrics.total_chunks}")
print(f"Average time per chunk: {metrics.avg_time_per_chunk}ms")

Cost Estimation

# Estimate embedding costs
cost_info = embedding.estimate_cost(
    num_texts=10000,
    avg_text_length=200
)

print(f"Estimated cost: ${cost_info['estimated_cost']:.4f}")
print(f"Estimated tokens: {cost_info['estimated_tokens']}")
print(f"Price per million: ${cost_info['price_per_million_tokens']}")

Integration with Knowledge Base

from upsonic import KnowledgeBase
from upsonic.embeddings import OpenAIEmbedding
from upsonic.vectordb import QdrantProvider
from upsonic.vectordb.config import Config, CoreConfig, ProviderName, Mode

# Create embedding provider
embedding = OpenAIEmbedding(
    model_name="text-embedding-3-small"
)

# Create vector database
vectordb = QdrantProvider(Config(
    core=CoreConfig(
        provider_name=ProviderName.QDRANT,
        mode=Mode.IN_MEMORY,
        collection_name="my_collection",
        vector_size=1536
    )
))

# Create knowledge base with embeddings
knowledge_base = KnowledgeBase(
    sources=["documents/"],
    embedding_provider=embedding,
    vectordb=vectordb
)

# Use in agent task
from upsonic import Agent, Task

agent = Agent(name="Assistant")
task = Task(
    description="Answer questions about the documents",
    context=[knowledge_base]
)

result = agent.do(task)

Best Practices

Provider Selection

  1. Production (Cloud): OpenAI for quality, Azure OpenAI for enterprise
  2. Privacy: Ollama or FastEmbed for local execution
  3. Cost-Effective: FastEmbed for no API costs, text-embedding-3-small for cloud
  4. Multilingual: Gemini or Cohere multilingual models
  5. Custom Models: HuggingFace for flexibility

Performance Optimization

# Optimize batch size based on model and hardware
config = OpenAIEmbeddingConfig(
    model_name="text-embedding-3-small",
    batch_size=100,  # Larger batches for API providers
    parallel_requests=5,
    enable_adaptive_batching=True
)

# For local models, adjust based on memory
config = FastEmbedConfig(
    model_name="BAAI/bge-small-en-v1.5",
    batch_size=32,  # Smaller batches for local
    enable_parallel_processing=True,
    threads=4
)

Rate Limiting

# Configure rate limits for API providers
config = OpenAIEmbeddingConfig(
    model_name="text-embedding-3-small",
    enable_rate_limiting=True,
    requests_per_minute=3000,
    tokens_per_minute=1000000
)

# Azure has different limits
config = AzureOpenAIEmbeddingConfig(
    deployment_name="your-deployment",
    requests_per_minute=240,
    tokens_per_minute=240000
)

Resource Cleanup

# Always close embedding providers when done
try:
    embeddings = await embedding.embed_texts(texts)
finally:
    await embedding.close()

# Or use context manager pattern
async with embedding:
    embeddings = await embedding.embed_texts(texts)

Factory Functions

Quick creation functions for common configurations:
from upsonic.embeddings import (
    create_openai_embedding,
    create_azure_openai_embedding,
    create_bedrock_embedding,
    create_gemini_embedding,
    create_fastembed_provider,
    create_ollama_embedding,
    create_best_available_embedding
)

# OpenAI
embedding = create_openai_embedding(model_name="text-embedding-3-small")

# Auto-detect best available provider
embedding = create_best_available_embedding()

# List available providers
from upsonic.embeddings import list_available_providers
providers = list_available_providers()

Complete Example

import asyncio
from upsonic.embeddings import OpenAIEmbedding, EmbeddingMode
from upsonic.schemas.data_models import Chunk

async def main():
    # Initialize embedding provider
    embedding = OpenAIEmbedding(
        model_name="text-embedding-3-small",
        cache_embeddings=True,
        normalize_embeddings=True,
        show_progress=True
    )
    
    try:
        # Validate connection
        if not await embedding.validate_connection():
            raise Exception("Failed to connect to embedding provider")
        
        # Create chunks
        documents = [
            "Artificial intelligence is transforming industries.",
            "Machine learning enables predictive analytics.",
            "Deep learning powers modern AI applications."
        ]
        
        chunks = [Chunk(text_content=doc) for doc in documents]
        
        # Generate embeddings
        doc_embeddings = await embedding.embed_documents(chunks)
        print(f"Generated {len(doc_embeddings)} document embeddings")
        
        # Embed query
        query = "What is AI?"
        query_embedding = await embedding.embed_query(query)
        print(f"Query embedding dimensions: {len(query_embedding)}")
        
        # Get metrics
        metrics = embedding.get_metrics()
        print(f"Total processing time: {metrics.embedding_time_ms:.2f}ms")
        print(f"Average time per chunk: {metrics.avg_time_per_chunk:.2f}ms")
        
        # Get cost estimate
        cost_info = embedding.estimate_cost(
            num_texts=1000,
            avg_text_length=200
        )
        print(f"Estimated cost for 1000 texts: ${cost_info['estimated_cost']:.4f}")
        
    finally:
        # Clean up
        await embedding.close()

if __name__ == "__main__":
    asyncio.run(main())

Model Comparison

ProviderModelDimensionsContext LengthSpeedCostBest For
OpenAItext-embedding-3-small15368191 tokensFast$Production
OpenAItext-embedding-3-large30728191 tokensMedium$$$Quality
Azure OpenAItext-embedding-ada-00215368191 tokensFast$Enterprise
Bedrockamazon.titan-embed-text-v210248192 tokensFast$AWS
Geminigemini-embedding-001768-30722048 tokensFast$Multilingual
HuggingFaceall-MiniLM-L6-v2384256 tokensVery FastFreeLocal
HuggingFaceall-mpnet-base-v2768384 tokensFastFreeQuality
FastEmbedBAAI/bge-small-en-v1.5384512 tokensVery FastFreeEfficiency
FastEmbedBAAI/bge-large-en-v1.51024512 tokensFastFreeQuality
Ollamanomic-embed-text7688192 tokensFastFreePrivacy