Skip to main content

Overview

When you don’t specify loaders or splitters, KnowledgeBase automatically detects and creates the optimal components for each source based on file type, content analysis, and your quality preferences. This means you can pass a mix of PDFs, Markdown, JSON, and code files — and KnowledgeBase will handle each one with the right strategy.

Basic Auto-Detection

Simply omit loaders and splitters — KnowledgeBase figures out the rest:
from upsonic import Agent, Task, KnowledgeBase
from upsonic.embeddings import OpenAIEmbedding, OpenAIEmbeddingConfig
from upsonic.vectordb import ChromaProvider, ChromaConfig, ConnectionConfig, Mode

embedding = OpenAIEmbedding(OpenAIEmbeddingConfig())
vectordb = ChromaProvider(ChromaConfig(
    collection_name="auto_kb",
    vector_size=1536,
    connection=ConnectionConfig(mode=Mode.IN_MEMORY)
))

# Auto-detects loaders and splitters for each source type
kb = KnowledgeBase(
    sources=["report.pdf", "guide.md", "config.json", "app.py"],
    embedding_provider=embedding,
    vectordb=vectordb
)

agent = Agent("anthropic/claude-sonnet-4-5")
task = Task(
    description="What are the error handling mechanisms described in the code?",
    context=[kb]
)

result = agent.do(task)
print(result)
Behind the scenes, KnowledgeBase will:
  • Use a PDF loader for report.pdf
  • Use a Markdown loader for guide.md
  • Use a JSON loader for config.json
  • Use a code-aware loader for app.py
  • Select appropriate chunking strategies for each file type

Quality Preferences

Control the speed vs quality trade-off for auto-detected splitters:
from upsonic import Agent, Task, KnowledgeBase
from upsonic.embeddings import OpenAIEmbedding, OpenAIEmbeddingConfig
from upsonic.vectordb import ChromaProvider, ChromaConfig, ConnectionConfig, Mode

embedding = OpenAIEmbedding(OpenAIEmbeddingConfig())
vectordb = ChromaProvider(ChromaConfig(
    collection_name="quality_kb",
    vector_size=1536,
    connection=ConnectionConfig(mode=Mode.IN_MEMORY)
))

# "quality" mode selects more sophisticated chunking strategies
kb = KnowledgeBase(
    sources=["legal_contract.pdf"],
    embedding_provider=embedding,
    vectordb=vectordb,
    quality_preference="quality"
)

agent = Agent("anthropic/claude-sonnet-4-5")
task = Task(
    description="Extract the exact definition of 'force majeure' from the contract",
    context=[kb]
)

result = agent.do(task)
print(result)
PreferenceBehavior
"fast"Optimized for speed — simple recursive chunking
"balanced"Good balance between speed and quality (default)
"quality"Optimized for retrieval quality — may use semantic chunking

Use Cases

Optimize the auto-detection strategy for your specific use case:
from upsonic import Agent, Task, KnowledgeBase
from upsonic.embeddings import OpenAIEmbedding, OpenAIEmbeddingConfig
from upsonic.vectordb import ChromaProvider, ChromaConfig, ConnectionConfig, Mode

embedding = OpenAIEmbedding(OpenAIEmbeddingConfig())
vectordb = ChromaProvider(ChromaConfig(
    collection_name="usecase_kb",
    vector_size=1536,
    connection=ConnectionConfig(mode=Mode.IN_MEMORY)
))

kb = KnowledgeBase(
    sources=["documentation/"],
    embedding_provider=embedding,
    vectordb=vectordb,
    use_case="rag_retrieval",
    quality_preference="balanced"
)

agent = Agent("anthropic/claude-sonnet-4-5")
task = Task(
    description="What are the specific prerequisites for cloud deployment?",
    context=[kb]
)

result = agent.do(task)
print(result)

Configuration Hints

Pass configuration hints to the auto-detection system via loader_config and splitter_config:
from upsonic import Agent, Task, KnowledgeBase
from upsonic.embeddings import OpenAIEmbedding, OpenAIEmbeddingConfig
from upsonic.vectordb import ChromaProvider, ChromaConfig, ConnectionConfig, Mode

embedding = OpenAIEmbedding(OpenAIEmbeddingConfig())
vectordb = ChromaProvider(ChromaConfig(
    collection_name="config_kb",
    vector_size=1536,
    connection=ConnectionConfig(mode=Mode.IN_MEMORY)
))

kb = KnowledgeBase(
    sources=["document.pdf"],
    embedding_provider=embedding,
    vectordb=vectordb,
    splitter_config={"chunk_size": 512, "chunk_overlap": 50}
)

agent = Agent("anthropic/claude-sonnet-4-5")
task = Task(
    description="Find the detailed parameter description for the 'initialize' function",
    context=[kb]
)

result = agent.do(task)
print(result)
These hints are passed to the auto-detected components, giving you control without manually instantiating loaders and splitters.