Skip to main content

Overview

HuggingFace provides access to thousands of embedding models from the HuggingFace Hub. Supports both local model execution and Inference API, with options for quantization, GPU acceleration, and custom pooling strategies. Provider Class: HuggingFaceEmbedding Config Class: HuggingFaceEmbeddingConfig

Dependencies

pip install transformers torch
For Inference API (optional):
pip install huggingface_hub

Examples

from upsonic import Agent, Task, KnowledgeBase
from upsonic.embeddings import HuggingFaceEmbedding, HuggingFaceEmbeddingConfig
from upsonic.vectordb import ChromaProvider, ChromaConfig, ConnectionConfig, Mode

# Create embedding provider (local)
embedding = HuggingFaceEmbedding(HuggingFaceEmbeddingConfig(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    use_local=True,
    pooling_strategy="mean"
))

# Setup KnowledgeBase
vectordb = ChromaProvider(ChromaConfig(
    collection_name="hf_docs",
    vector_size=384,
    connection=ConnectionConfig(mode=Mode.IN_MEMORY)
))

kb = KnowledgeBase(
    sources=["document.txt"],
    embedding_provider=embedding,
    vectordb=vectordb
)

# Query with Agent
agent = Agent("openai/gpt-4o")
task = Task("What is this document about?", context=[kb])
result = agent.do(task)
print(result)

Parameters

ParameterTypeDescriptionDefaultSource
model_namestrHuggingFace model name or path"sentence-transformers/all-MiniLM-L6-v2"Specific
hf_tokenstr | NoneHuggingFace API tokenNoneSpecific
use_apiboolUse HuggingFace Inference API instead of local modelFalseSpecific
use_localboolUse local model executionTrueSpecific
devicestr | NoneDevice to run model on (auto-detected if None)NoneSpecific
torch_dtypestrPyTorch data type (float16, float32, bfloat16)"float32"Specific
trust_remote_codeboolTrust remote code in modelFalseSpecific
max_seq_lengthint | NoneMaximum sequence lengthNoneSpecific
pooling_strategystrPooling strategy (mean, cls, max)"mean"Specific
enable_quantizationboolEnable model quantizationFalseSpecific
quantization_bitsintQuantization bits (4, 8, 16)8Specific
enable_gradient_checkpointingboolEnable gradient checkpointing to save memoryFalseSpecific
wait_for_modelboolWait for model to load if using APITrueSpecific
timeoutint | NoneTimeout for modelNoneSpecific
cache_dirstr | NoneModel cache directoryNoneSpecific
force_downloadboolForce re-download of modelFalseSpecific
batch_sizeintBatch size for document embedding100Base
normalize_embeddingsboolWhether to normalize embeddings to unit lengthTrueBase
show_progressboolWhether to show progress during batch operationsTrueBase