Documentation Index
Fetch the complete documentation index at: https://docs.upsonic.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
When you don’t specify loaders or splitters, KnowledgeBase automatically detects and creates the optimal components for each source based on file type, content analysis, and your quality preferences.
This means you can pass a mix of PDFs, Markdown, JSON, and code files — and KnowledgeBase will handle each one with the right strategy.
Basic Auto-Detection
Simply omit loaders and splitters — KnowledgeBase figures out the rest:
from upsonic import Agent, Task, KnowledgeBase
from upsonic.embeddings import OpenAIEmbedding, OpenAIEmbeddingConfig
from upsonic.vectordb import ChromaProvider, ChromaConfig, ConnectionConfig, Mode
embedding = OpenAIEmbedding(OpenAIEmbeddingConfig())
vectordb = ChromaProvider(ChromaConfig(
collection_name="auto_kb",
vector_size=1536,
connection=ConnectionConfig(mode=Mode.IN_MEMORY)
))
# Auto-detects loaders and splitters for each source type
kb = KnowledgeBase(
sources=["report.pdf", "guide.md", "config.json", "app.py"],
embedding_provider=embedding,
vectordb=vectordb
)
agent = Agent("anthropic/claude-sonnet-4-5")
task = Task(
description="What are the error handling mechanisms described in the code?",
context=[kb]
)
result = agent.do(task)
print(result)
Behind the scenes, KnowledgeBase will:
- Use a PDF loader for
report.pdf
- Use a Markdown loader for
guide.md
- Use a JSON loader for
config.json
- Use a code-aware loader for
app.py
- Select appropriate chunking strategies for each file type
Quality Preferences
Control the speed vs quality trade-off for auto-detected splitters:
from upsonic import Agent, Task, KnowledgeBase
from upsonic.embeddings import OpenAIEmbedding, OpenAIEmbeddingConfig
from upsonic.vectordb import ChromaProvider, ChromaConfig, ConnectionConfig, Mode
embedding = OpenAIEmbedding(OpenAIEmbeddingConfig())
vectordb = ChromaProvider(ChromaConfig(
collection_name="quality_kb",
vector_size=1536,
connection=ConnectionConfig(mode=Mode.IN_MEMORY)
))
# "quality" mode selects more sophisticated chunking strategies
kb = KnowledgeBase(
sources=["legal_contract.pdf"],
embedding_provider=embedding,
vectordb=vectordb,
quality_preference="quality"
)
agent = Agent("anthropic/claude-sonnet-4-5")
task = Task(
description="Extract the exact definition of 'force majeure' from the contract",
context=[kb]
)
result = agent.do(task)
print(result)
| Preference | Behavior |
|---|
"fast" | Optimized for speed — simple recursive chunking |
"balanced" | Good balance between speed and quality (default) |
"quality" | Optimized for retrieval quality — may use semantic chunking |
Use Cases
Optimize the auto-detection strategy for your specific use case:
from upsonic import Agent, Task, KnowledgeBase
from upsonic.embeddings import OpenAIEmbedding, OpenAIEmbeddingConfig
from upsonic.vectordb import ChromaProvider, ChromaConfig, ConnectionConfig, Mode
embedding = OpenAIEmbedding(OpenAIEmbeddingConfig())
vectordb = ChromaProvider(ChromaConfig(
collection_name="usecase_kb",
vector_size=1536,
connection=ConnectionConfig(mode=Mode.IN_MEMORY)
))
kb = KnowledgeBase(
sources=["documentation/"],
embedding_provider=embedding,
vectordb=vectordb,
use_case="rag_retrieval",
quality_preference="balanced"
)
agent = Agent("anthropic/claude-sonnet-4-5")
task = Task(
description="What are the specific prerequisites for cloud deployment?",
context=[kb]
)
result = agent.do(task)
print(result)
Configuration Hints
Pass configuration hints to the auto-detection system via loader_config and splitter_config:
from upsonic import Agent, Task, KnowledgeBase
from upsonic.embeddings import OpenAIEmbedding, OpenAIEmbeddingConfig
from upsonic.vectordb import ChromaProvider, ChromaConfig, ConnectionConfig, Mode
embedding = OpenAIEmbedding(OpenAIEmbeddingConfig())
vectordb = ChromaProvider(ChromaConfig(
collection_name="config_kb",
vector_size=1536,
connection=ConnectionConfig(mode=Mode.IN_MEMORY)
))
kb = KnowledgeBase(
sources=["document.pdf"],
embedding_provider=embedding,
vectordb=vectordb,
splitter_config={"chunk_size": 512, "chunk_overlap": 50}
)
agent = Agent("anthropic/claude-sonnet-4-5")
task = Task(
description="Find the detailed parameter description for the 'initialize' function",
context=[kb]
)
result = agent.do(task)
print(result)
These hints are passed to the auto-detected components, giving you control without manually instantiating loaders and splitters.