> ## Documentation Index
> Fetch the complete documentation index at: https://docs.upsonic.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Intelligent Auto-Detection

> Automatic loader and splitter selection based on file type and content

## Overview

When you don't specify `loaders` or `splitters`, KnowledgeBase automatically detects and creates the optimal components for each source based on file type, content analysis, and your quality preferences.

This means you can pass a mix of PDFs, Markdown, JSON, and code files — and KnowledgeBase will handle each one with the right strategy.

## Basic Auto-Detection

Simply omit `loaders` and `splitters` — KnowledgeBase figures out the rest:

```python theme={null}
from upsonic import Agent, Task, KnowledgeBase
from upsonic.embeddings import OpenAIEmbedding, OpenAIEmbeddingConfig
from upsonic.vectordb import ChromaProvider, ChromaConfig, ConnectionConfig, Mode

embedding = OpenAIEmbedding(OpenAIEmbeddingConfig())
vectordb = ChromaProvider(ChromaConfig(
    collection_name="auto_kb",
    vector_size=1536,
    connection=ConnectionConfig(mode=Mode.IN_MEMORY)
))

# Auto-detects loaders and splitters for each source type
kb = KnowledgeBase(
    sources=["report.pdf", "guide.md", "config.json", "app.py"],
    embedding_provider=embedding,
    vectordb=vectordb
)

agent = Agent("anthropic/claude-sonnet-4-5")
task = Task(
    description="What are the error handling mechanisms described in the code?",
    context=[kb]
)

result = agent.do(task)
print(result)
```

Behind the scenes, KnowledgeBase will:

* Use a PDF loader for `report.pdf`
* Use a Markdown loader for `guide.md`
* Use a JSON loader for `config.json`
* Use a code-aware loader for `app.py`
* Select appropriate chunking strategies for each file type

## Quality Preferences

Control the speed vs quality trade-off for auto-detected splitters:

```python theme={null}
from upsonic import Agent, Task, KnowledgeBase
from upsonic.embeddings import OpenAIEmbedding, OpenAIEmbeddingConfig
from upsonic.vectordb import ChromaProvider, ChromaConfig, ConnectionConfig, Mode

embedding = OpenAIEmbedding(OpenAIEmbeddingConfig())
vectordb = ChromaProvider(ChromaConfig(
    collection_name="quality_kb",
    vector_size=1536,
    connection=ConnectionConfig(mode=Mode.IN_MEMORY)
))

# "quality" mode selects more sophisticated chunking strategies
kb = KnowledgeBase(
    sources=["legal_contract.pdf"],
    embedding_provider=embedding,
    vectordb=vectordb,
    quality_preference="quality"
)

agent = Agent("anthropic/claude-sonnet-4-5")
task = Task(
    description="Extract the exact definition of 'force majeure' from the contract",
    context=[kb]
)

result = agent.do(task)
print(result)
```

| Preference   | Behavior                                                    |
| ------------ | ----------------------------------------------------------- |
| `"fast"`     | Optimized for speed — simple recursive chunking             |
| `"balanced"` | Good balance between speed and quality (default)            |
| `"quality"`  | Optimized for retrieval quality — may use semantic chunking |

## Use Cases

Optimize the auto-detection strategy for your specific use case:

```python theme={null}
from upsonic import Agent, Task, KnowledgeBase
from upsonic.embeddings import OpenAIEmbedding, OpenAIEmbeddingConfig
from upsonic.vectordb import ChromaProvider, ChromaConfig, ConnectionConfig, Mode

embedding = OpenAIEmbedding(OpenAIEmbeddingConfig())
vectordb = ChromaProvider(ChromaConfig(
    collection_name="usecase_kb",
    vector_size=1536,
    connection=ConnectionConfig(mode=Mode.IN_MEMORY)
))

kb = KnowledgeBase(
    sources=["documentation/"],
    embedding_provider=embedding,
    vectordb=vectordb,
    use_case="rag_retrieval",
    quality_preference="balanced"
)

agent = Agent("anthropic/claude-sonnet-4-5")
task = Task(
    description="What are the specific prerequisites for cloud deployment?",
    context=[kb]
)

result = agent.do(task)
print(result)
```

## Configuration Hints

Pass configuration hints to the auto-detection system via `loader_config` and `splitter_config`:

```python theme={null}
from upsonic import Agent, Task, KnowledgeBase
from upsonic.embeddings import OpenAIEmbedding, OpenAIEmbeddingConfig
from upsonic.vectordb import ChromaProvider, ChromaConfig, ConnectionConfig, Mode

embedding = OpenAIEmbedding(OpenAIEmbeddingConfig())
vectordb = ChromaProvider(ChromaConfig(
    collection_name="config_kb",
    vector_size=1536,
    connection=ConnectionConfig(mode=Mode.IN_MEMORY)
))

kb = KnowledgeBase(
    sources=["document.pdf"],
    embedding_provider=embedding,
    vectordb=vectordb,
    splitter_config={"chunk_size": 512, "chunk_overlap": 50}
)

agent = Agent("anthropic/claude-sonnet-4-5")
task = Task(
    description="Find the detailed parameter description for the 'initialize' function",
    context=[kb]
)

result = agent.do(task)
print(result)
```

These hints are passed to the auto-detected components, giving you control without manually instantiating loaders and splitters.
