HuggingFace Embeddings

Overview

HuggingFace provides access to thousands of embedding models from the HuggingFace Hub. Supports both local model execution and Inference API, with options for quantization, GPU acceleration, and custom pooling strategies. Provider Class: HuggingFaceEmbedding Config Class: HuggingFaceEmbeddingConfig

Dependencies

pip install transformers torch

For Inference API (optional):

pip install huggingface_hub

Examples

from upsonic import Agent, Task, KnowledgeBase
from upsonic.embeddings import HuggingFaceEmbedding, HuggingFaceEmbeddingConfig
from upsonic.vectordb import ChromaProvider, ChromaConfig, ConnectionConfig, Mode

# Create embedding provider (local)
embedding = HuggingFaceEmbedding(HuggingFaceEmbeddingConfig(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    use_local=True,
    pooling_strategy="mean"
))

# Setup KnowledgeBase
vectordb = ChromaProvider(ChromaConfig(
    collection_name="hf_docs",
    vector_size=384,
    connection=ConnectionConfig(mode=Mode.IN_MEMORY)
))

kb = KnowledgeBase(
    sources=["document.txt"],
    embedding_provider=embedding,
    vectordb=vectordb
)

# Query with Agent
agent = Agent("openai/gpt-4o")
task = Task("What is this document about?", context=[kb])
result = agent.do(task)
print(result)

Parameters

Parameter	Type	Description	Default	Source
`model_name`	`str`	HuggingFace model name or path	`"sentence-transformers/all-MiniLM-L6-v2"`	Specific
`hf_token`	`str \| None`	HuggingFace API token	`None`	Specific
`use_api`	`bool`	Use HuggingFace Inference API instead of local model	`False`	Specific
`use_local`	`bool`	Use local model execution	`True`	Specific
`device`	`str \| None`	Device to run model on (auto-detected if None)	`None`	Specific
`torch_dtype`	`str`	PyTorch data type (float16, float32, bfloat16)	`"float32"`	Specific
`trust_remote_code`	`bool`	Trust remote code in model	`False`	Specific
`max_seq_length`	`int \| None`	Maximum sequence length	`None`	Specific
`pooling_strategy`	`str`	Pooling strategy (mean, cls, max)	`"mean"`	Specific
`enable_quantization`	`bool`	Enable model quantization	`False`	Specific
`quantization_bits`	`int`	Quantization bits (4, 8, 16)	`8`	Specific
`enable_gradient_checkpointing`	`bool`	Enable gradient checkpointing to save memory	`False`	Specific
`wait_for_model`	`bool`	Wait for model to load if using API	`True`	Specific
`timeout`	`int \| None`	Timeout for model	`None`	Specific
`cache_dir`	`str \| None`	Model cache directory	`None`	Specific
`force_download`	`bool`	Force re-download of model	`False`	Specific
`batch_size`	`int`	Batch size for document embedding	`100`	Base
`normalize_embeddings`	`bool`	Whether to normalize embeddings to unit length	`True`	Base
`show_progress`	`bool`	Whether to show progress during batch operations	`True`	Base

GET STARTED

CONCEPTS

DEPLOYMENT

FURTHER READINGS

HuggingFace Embeddings

Overview

Dependencies

Examples

Parameters

GET STARTED

CONCEPTS

DEPLOYMENT

FURTHER READINGS

​Overview

​Dependencies

​Examples

​Parameters

Overview

Dependencies

Examples

Parameters