Skip to main content

Parameters

ParameterTypeDefaultDescription
model_namestr"sentence-transformers/all-MiniLM-L6-v2"HuggingFace model name or path
hf_tokenOptional[str]NoneHuggingFace API token
use_apiboolFalseUse HuggingFace Inference API instead of local model
use_localboolTrueUse local model execution
deviceOptional[str]NoneDevice to run model on (auto-detected if None)
torch_dtypestr"float32"PyTorch data type (float16, float32, bfloat16)
trust_remote_codeboolFalseTrust remote code in model
max_seq_lengthOptional[int]NoneMaximum sequence length
pooling_strategystr"mean"Pooling strategy (mean, cls, max)
normalize_embeddingsboolTrueNormalize embeddings to unit length
enable_quantizationboolFalseEnable model quantization
quantization_bitsint8Quantization bits (4, 8, 16)
enable_gradient_checkpointingboolFalseEnable gradient checkpointing to save memory
wait_for_modelboolTrueWait for model to load if using API
timeoutintNoneTimeout for model
cache_dirOptional[str]NoneModel cache directory
force_downloadboolFalseForce re-download of model

Functions

__init__

Initialize the HuggingFaceEmbedding provider. Parameters:
  • config (Optional[HuggingFaceEmbeddingConfig]): Configuration object
  • **kwargs: Additional configuration options

_setup_device

Setup compute device for local models.

_setup_authentication

Setup HuggingFace authentication.

_setup_local_model

Setup local model and tokenizer with optional quantization.

_setup_api_session

Setup InferenceClient for HuggingFace API calls.

supported_modes

Get supported embedding modes. Returns:
  • List[EmbeddingMode]: List of supported embedding modes

pricing_info

Get HuggingFace pricing info (API usage). Returns:
  • Dict[str, float]: Pricing information

get_model_info

Get information about the current HuggingFace model. Returns:
  • Dict[str, Any]: Model information

_mean_pooling

Apply mean pooling to get sentence embeddings. Parameters:
  • model_output: Model output
  • attention_mask: Attention mask
Returns:
  • torch.Tensor: Pooled embeddings

_cls_pooling

Use CLS token for pooling. Parameters:
  • model_output: Model output
  • attention_mask: Attention mask
Returns:
  • torch.Tensor: CLS token embeddings

_max_pooling

Apply max pooling. Parameters:
  • model_output: Model output
  • attention_mask: Attention mask
Returns:
  • torch.Tensor: Max pooled embeddings

_apply_pooling

Apply the configured pooling strategy. Parameters:
  • model_output: Model output
  • attention_mask: Attention mask
Returns:
  • torch.Tensor: Pooled embeddings

_embed_local

Embed texts using local model. Parameters:
  • texts (List[str]): List of texts to embed
Returns:
  • List[List[float]]: List of embedding vectors

_embed_api

Embed texts using HuggingFace InferenceClient. Parameters:
  • texts (List[str]): List of texts to embed
Returns:
  • List[List[float]]: List of embedding vectors

_embed_batch

Embed a batch of texts using HuggingFace model or API. Parameters:
  • texts (List[str]): List of text strings to embed
  • mode (EmbeddingMode): Embedding mode
Returns:
  • List[List[float]]: List of embedding vectors

validate_connection

Validate HuggingFace model or API connection. Returns:
  • bool: True if connection is valid

get_memory_usage

Get memory usage information for local models. Returns:
  • Dict[str, Any]: Memory usage information

close

Clean up HuggingFace models, tokenizer, and API client.

remove_local_cache

Remove HuggingFace model/tokenizer cache files from local storage. Returns:
  • bool: True if cache was removed successfully

create_sentence_transformer_embedding

Create a sentence transformer embedding provider. Parameters:
  • model_name (str): HuggingFace model name
  • **kwargs: Additional configuration options
Returns:
  • HuggingFaceEmbedding: Configured HuggingFaceEmbedding instance

create_mpnet_embedding

Create MPNet embedding provider (high quality). Parameters:
  • **kwargs: Additional configuration options
Returns:
  • HuggingFaceEmbedding: Configured HuggingFaceEmbedding instance

create_minilm_embedding

Create MiniLM embedding provider (fast and efficient). Parameters:
  • **kwargs: Additional configuration options
Returns:
  • HuggingFaceEmbedding: Configured HuggingFaceEmbedding instance

create_huggingface_api_embedding

Create HuggingFace API embedding provider. Parameters:
  • model_name (str): HuggingFace model name
  • hf_token (Optional[str]): HuggingFace API token
  • **kwargs: Additional configuration options
Returns:
  • HuggingFaceEmbedding: Configured HuggingFaceEmbedding instance
I