model_name | str | HuggingFace model name or path | "sentence-transformers/all-MiniLM-L6-v2" | Specific |
hf_token | str | None | HuggingFace API token | None | Specific |
use_api | bool | Use HuggingFace Inference API instead of local model | False | Specific |
use_local | bool | Use local model execution | True | Specific |
device | str | None | Device to run model on (auto-detected if None) | None | Specific |
torch_dtype | str | PyTorch data type (float16, float32, bfloat16) | "float32" | Specific |
trust_remote_code | bool | Trust remote code in model | False | Specific |
max_seq_length | int | None | Maximum sequence length | None | Specific |
pooling_strategy | str | Pooling strategy (mean, cls, max) | "mean" | Specific |
enable_quantization | bool | Enable model quantization | False | Specific |
quantization_bits | int | Quantization bits (4, 8, 16) | 8 | Specific |
enable_gradient_checkpointing | bool | Enable gradient checkpointing to save memory | False | Specific |
wait_for_model | bool | Wait for model to load if using API | True | Specific |
timeout | int | None | Timeout for model | None | Specific |
cache_dir | str | None | Model cache directory | None | Specific |
force_download | bool | Force re-download of model | False | Specific |
batch_size | int | Batch size for document embedding | 100 | Base |
normalize_embeddings | bool | Whether to normalize embeddings to unit length | True | Base |
show_progress | bool | Whether to show progress during batch operations | True | Base |