Embeddings
Transform text into vector representations for semantic search and RAG pipelinesOverview
In the Upsonic framework, Embeddings are the foundation for converting text into numerical vector representations that capture semantic meaning. The framework provides a unified interface across multiple embedding providers, enabling seamless switching between local and cloud-based models while maintaining consistent behavior and advanced features like caching, batching, and automatic retry mechanisms.Embedding Providers
Upsonic supports multiple embedding providers with a consistent API:Provider | Type | Best For | Pricing |
---|---|---|---|
OpenAI | Cloud API | Production deployments, high quality | 0.13 per 1M tokens |
Azure OpenAI | Cloud API | Enterprise, compliance requirements | Variable by region |
AWS Bedrock | Cloud API | AWS infrastructure, multi-model | 0.0007 per 1K tokens |
Google Gemini | Cloud API | Multilingual, code embeddings | $0.15 per 1M tokens |
HuggingFace | Local/API | Custom models, flexibility | Free (local) or variable (API) |
FastEmbed | Local | Fast inference, no API costs | Free |
Ollama | Local | Privacy, offline operation | Free |
Base Configuration
All embedding providers share common configuration options:Attribute | Type | Description | Default |
---|---|---|---|
model_name | str | Model identifier | Provider-specific |
batch_size | int | Batch size for processing | 100 |
max_retries | int | Maximum retry attempts | 3 |
retry_delay | float | Initial retry delay (seconds) | 1.0 |
timeout | float | Request timeout (seconds) | 30.0 |
normalize_embeddings | bool | Normalize to unit length | True |
show_progress | bool | Display progress during batch ops | True |
cache_embeddings | bool | Enable embedding caching | False |
enable_retry_with_backoff | bool | Exponential backoff on retries | True |
enable_adaptive_batching | bool | Dynamic batch size adjustment | True |
enable_compression | bool | Enable dimensionality reduction | False |
OpenAI Embeddings
Basic Usage
Advanced Configuration
Model Options
Azure OpenAI Embeddings
Basic Usage
Managed Identity Authentication
Enterprise Features
AWS Bedrock Embeddings
Basic Usage
AWS Credentials
Model Options
Google Gemini Embeddings
Basic Usage
Task-Specific Embeddings
Vertex AI Integration
Advanced Configuration
HuggingFace Embeddings
Local Model Execution
Quantization for Efficiency
HuggingFace API
FastEmbed (Qdrant)
Basic Usage
GPU Acceleration
Sparse Embeddings
Advanced Configuration
Ollama Embeddings
Basic Usage
Popular Models
Custom Server Configuration
Embedding Modes
Different providers support specific embedding modes for optimization:Advanced Features
Caching
Progress Tracking
Error Handling and Retries
Validation and Testing
Cost Estimation
Integration with Knowledge Base
Best Practices
Provider Selection
- Production (Cloud): OpenAI for quality, Azure OpenAI for enterprise
- Privacy: Ollama or FastEmbed for local execution
- Cost-Effective: FastEmbed for no API costs, text-embedding-3-small for cloud
- Multilingual: Gemini or Cohere multilingual models
- Custom Models: HuggingFace for flexibility
Performance Optimization
Rate Limiting
Resource Cleanup
Factory Functions
Quick creation functions for common configurations:Complete Example
Model Comparison
Provider | Model | Dimensions | Context Length | Speed | Cost | Best For |
---|---|---|---|---|---|---|
OpenAI | text-embedding-3-small | 1536 | 8191 tokens | Fast | $ | Production |
OpenAI | text-embedding-3-large | 3072 | 8191 tokens | Medium | $$$ | Quality |
Azure OpenAI | text-embedding-ada-002 | 1536 | 8191 tokens | Fast | $ | Enterprise |
Bedrock | amazon.titan-embed-text-v2 | 1024 | 8192 tokens | Fast | $ | AWS |
Gemini | gemini-embedding-001 | 768-3072 | 2048 tokens | Fast | $ | Multilingual |
HuggingFace | all-MiniLM-L6-v2 | 384 | 256 tokens | Very Fast | Free | Local |
HuggingFace | all-mpnet-base-v2 | 768 | 384 tokens | Fast | Free | Quality |
FastEmbed | BAAI/bge-small-en-v1.5 | 384 | 512 tokens | Very Fast | Free | Efficiency |
FastEmbed | BAAI/bge-large-en-v1.5 | 1024 | 512 tokens | Fast | Free | Quality |
Ollama | nomic-embed-text | 768 | 8192 tokens | Fast | Free | Privacy |