Skip to main content

Overview

Pinecone is a managed vector database service designed for production-scale similarity search. It’s cloud-only and supports both dense and sparse vectors with automatic scaling. Provider Class: PineconeProvider
Config Class: PineconeConfig

Dependencies

pip install "upsonic[rag]"

Examples

from upsonic import Agent, Task, KnowledgeBase
from upsonic.embeddings.openai_provider import OpenAIEmbeddingProvider
from upsonic.vectordb import PineconeProvider, PineconeConfig
from pydantic import SecretStr

# Setup embedding provider
embedding = OpenAIEmbeddingProvider(api_key="your-api-key")

# Create Pinecone configuration
config = PineconeConfig(
    collection_name="my_collection",
    vector_size=1536,
    api_key=SecretStr("your-pinecone-api-key"),
    environment="us-east-1-aws",
    namespace="production"
)
vectordb = PineconeProvider(config)

# Create knowledge base
kb = KnowledgeBase(
    sources="document.pdf",
    embedding_provider=embedding,
    vectordb=vectordb
)

# Use with Agent
agent = Agent("openai/gpt-4o")
task = Task(
    description="Query the knowledge base",
    context=[kb]
)
result = agent.do(task)

Parameters

Base Parameters (from BaseVectorDBConfig)

ParameterTypeDescriptionDefaultRequired
collection_namestrName of the collection"default_collection"No
vector_sizeintDimension of vectors-Yes
distance_metricDistanceMetricSimilarity metric (COSINE, EUCLIDEAN, DOT_PRODUCT)COSINENo
recreate_if_existsboolRecreate collection if it existsFalseNo
default_top_kintDefault number of results10No
default_similarity_thresholdOptional[float]Minimum similarity score (0.0-1.0)NoneNo
dense_search_enabledboolEnable dense vector searchTrueNo
full_text_search_enabledboolEnable full-text searchTrueNo
hybrid_search_enabledboolEnable hybrid searchTrueNo
default_hybrid_alphafloatDefault alpha for hybrid search (0.0-1.0)0.5No
default_fusion_methodLiteral['rrf', 'weighted']Default fusion method for hybrid search'weighted'No
provider_nameOptional[str]Provider nameNoneNo
provider_descriptionOptional[str]Provider descriptionNoneNo
provider_idOptional[str]Provider IDNoneNo
default_metadataOptional[Dict[str, Any]]Default metadata for all recordsNoneNo
auto_generate_content_idboolAuto-generate content IDsTrueNo
indexed_fieldsOptional[List[Union[str, Dict[str, Any]]]]Fields to index for filteringNoneNo

Pinecone-Specific Parameters

ParameterTypeDescriptionDefaultRequired
api_keySecretStrPinecone API key-Yes
specOptional[Union[Dict[str, Any], ServerlessSpec, PodSpec]]Index specification (ServerlessSpec or PodSpec)NoneNo
environmentOptional[str]Environment/region for backward compatibility (e.g., “aws-us-east-1”)NoneNo
namespaceOptional[str]Namespace for data isolationNoneNo
metricLiteral['cosine', 'euclidean', 'dotproduct']Distance metric (auto-mapped from distance_metric)'cosine'No
podsOptional[int]Number of pods (for PodSpec)NoneNo
pod_typeOptional[str]Pod type specification (for PodSpec)NoneNo
replicasOptional[int]Number of replicas (for PodSpec)NoneNo
shardsOptional[int]Number of shards (for PodSpec)NoneNo
hostOptional[str]Custom Pinecone hostNoneNo
additional_headersOptional[Dict[str, str]]Additional HTTP headersNoneNo
pool_threadsOptional[int]Thread pool size1No
index_apiOptional[Any]Custom index API instanceNoneNo
use_sparse_vectorsboolEnable sparse vector support (requires hybrid_search_enabled=True, sets metric to dotproduct)FalseNo
sparse_encoder_modelstrModel for sparse vector generation"pinecone-sparse-english-v0"No
batch_sizeintBatch size for upsert operations100No
show_progressboolShow progress during batch operationsFalseNo
timeoutOptional[int]Request timeout in secondsNoneNo
rerankerOptional[Any]Reranker instance for post-processing resultsNoneNo