Skip to main content

Attributes

The KnowledgeBase system is configured through the KnowledgeBase class, which provides the following attributes:
AttributeTypeDefaultDescription
sourcesUnion[str, Path, List[Union[str, Path]]](required)File paths, directory paths, or string content to process
embedding_providerEmbeddingProvider(required)Provider instance for creating vector embeddings
vectordbBaseVectorDBProvider(required)Vector database provider instance for storage
splittersUnion[BaseChunker, List[BaseChunker]] | NoneNoneText chunking strategies (auto-detected if None)
loadersUnion[BaseLoader, List[BaseLoader]] | NoneNoneDocument loaders for different file types (auto-detected if None)
namestr | NoneNoneHuman-readable name for the knowledge base (auto-generated if None)
use_casestr"rag_retrieval"Use case for chunking optimization
quality_preferencestr"balanced"Speed vs quality preference: "fast", "balanced", or "quality"
loader_configDict[str, Any] | NoneNoneConfiguration options specifically for loaders
splitter_configDict[str, Any] | NoneNoneConfiguration options specifically for splitters

Configuration Example

from upsonic import Agent, Task, KnowledgeBase
from upsonic.embeddings import OpenAIEmbedding, OpenAIEmbeddingConfig
from upsonic.vectordb import ChromaProvider, ChromaConfig, ConnectionConfig, Mode

# Setup embedding provider
embedding = OpenAIEmbedding(OpenAIEmbeddingConfig())

# Setup vector database
config = ChromaConfig(
    collection_name="my_kb",
    vector_size=1536,
    connection=ConnectionConfig(mode=Mode.EMBEDDED, db_path="./chroma_db")
)
vectordb = ChromaProvider(config)

# Create knowledge base with configuration
kb = KnowledgeBase(
    sources=["document.pdf", "data/"],
    embedding_provider=embedding,
    vectordb=vectordb,
    name="my_custom_kb",
    use_case="rag_retrieval",
    quality_preference="balanced",
    loader_config={"chunk_size": 1000},
    splitter_config={"chunk_overlap": 200}
)

# Use with Agent
agent = Agent("openai/gpt-4o")
task = Task(
    description="What are the main topics in the documents?",
    context=[kb]
)

result = agent.do(task)
print(result)