> ## Documentation Index
> Fetch the complete documentation index at: https://docs.upsonic.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# KnowledgeBase

> Build intelligent RAG systems with vector databases

## Overview

KnowledgeBase enables you to build Retrieval-Augmented Generation (RAG) systems by automatically processing documents, creating embeddings, and storing them in vector databases. It integrates seamlessly with Agent and Task to provide relevant context for AI-powered queries.

## Key Features

* **Automatic Processing**: Loads documents, chunks text, creates embeddings, and stores in vector databases
* **Multiple Formats**: Supports PDFs, Markdown, DOCX, CSV, JSON, HTML, and more
* **Intelligent Chunking**: Auto-detects optimal text splitting strategies based on file type and use case
* **Flexible Storage**: Works with Chroma, Milvus, Qdrant, Pinecone, Weaviate, FAISS, PGVector, and SuperMemory
* **Hybrid Search**: Combines dense vector search with full-text search for better results
* **Tool Integration**: Can be used as a tool, allowing agents to actively search and retrieve information
* **Named Knowledge Bases**: Use `name`, `description`, and `topics` to help agents intelligently route queries across multiple knowledge bases

## Installation

To use KnowledgeBase, you'll need to install the required dependencies for your chosen vector database and (optionally) document loaders and embedding providers.

<Note>
  **Example: Setting up KnowledgeBase with Chroma**

  For a complete RAG setup using Chroma as the vector database, PDF loader, and OpenAI embeddings:

  ```bash theme={null}
  uv pip install "upsonic[chroma]"
  uv pip install "upsonic[pdf-loader]"
  uv pip install "upsonic[embeddings]"
  ```

  **What each optional group provides:**

  * `[chroma]` - ChromaDB vector database client
  * `[pdf-loader]` - PDF document loader (PyPDF)
  * `[embeddings]` - Embedding providers (OpenAI, etc.)

  For other vector databases, replace `chroma` with `qdrant`, `milvus`, `weaviate`, `pinecone`, `faiss`, `pgvector`, or `supermemory`. For other loaders, see the [Loaders documentation](/concepts/knowledgebase/document-loaders/pypdf).

  <Tip>
    **SuperMemory** handles embeddings internally, so you don't need to install `[embeddings]` or pass an `embedding_provider` when using it.
  </Tip>
</Note>

## Quick Start

Create a KnowledgeBase from documents and use it with an Agent:

```python theme={null}
from upsonic import Agent, Task, KnowledgeBase
from upsonic.embeddings import OpenAIEmbedding, OpenAIEmbeddingConfig
from upsonic.vectordb import ChromaProvider, ChromaConfig, ConnectionConfig, Mode
from upsonic.loaders.pdf import PdfLoader
from upsonic.loaders.config import PdfLoaderConfig

# Setup embedding provider
embedding = OpenAIEmbedding(OpenAIEmbeddingConfig())

# Setup vector database
config = ChromaConfig(
    collection_name="my_kb",
    vector_size=1536,
    connection=ConnectionConfig(mode=Mode.EMBEDDED, db_path="./chroma_db")
)
vectordb = ChromaProvider(config)

# Setup PDF loader
loader = PdfLoader(PdfLoaderConfig())

# Create knowledge base
kb = KnowledgeBase(
    sources=["document.pdf", "data/"],
    embedding_provider=embedding,
    vectordb=vectordb,
    loaders=[loader]
)

# Use with Agent
agent = Agent("anthropic/claude-sonnet-4-5")
task = Task(
    description="What are the main topics in the documents?",
    context=[kb]
)

result = agent.do(task)
print(result)
```

## Integrations

KnowledgeBase supports a rich ecosystem of integrations for vector stores, embedding providers, document loaders, and text splitters.

<CardGroup cols={2}>
  <Card title="Vector Stores" icon="database" href="/concepts/knowledgebase/vector-stores">
    Chroma, Qdrant, Pinecone, Milvus, PGVector, FAISS, Weaviate, SuperMemory
  </Card>

  <Card title="Embedding Providers" icon="layer-group" href="/integrations/overview#embedding-providers">
    OpenAI, Azure, Google, AWS Bedrock, HuggingFace, FastEmbed, Ollama
  </Card>

  <Card title="Document Loaders" icon="file-import" href="/integrations/overview#document-loaders">
    PDF, DOCX, CSV, JSON, Markdown, HTML, XML, YAML, Text & more
  </Card>

  <Card title="Text Splitters" icon="scissors" href="/integrations/overview#text-splitters">
    Recursive, Semantic, Agentic, Character, Markdown, HTML, JSON, Python
  </Card>
</CardGroup>

## Navigation

* [Getting Started](/concepts/knowledgebase/basic-rag-example) - Build your first RAG system in 5 minutes
* [Attributes](/concepts/knowledgebase/attributes) - Configuration options for KnowledgeBase
* [Putting Files](/concepts/knowledgebase/putting-files) - How to add documents to your knowledge base
* [Using as Tool](/concepts/knowledgebase/using-as-tool) - Use KnowledgeBase as a tool in Agent or Task
* [Query Control](/concepts/knowledgebase/query-control) - Control when RAG context is injected
* [Examples](/concepts/knowledgebase/examples) - Practical runnable examples
* [Vector Stores](/concepts/knowledgebase/vector-stores) - Choose and configure your vector database

### Advanced Features

* [Auto-Detection](/concepts/knowledgebase/advanced/auto-detection) - Intelligent loader and splitter selection
* [Indexed Processing](/concepts/knowledgebase/advanced/indexed-processing) - Per-source loaders and splitters
* [Direct Content](/concepts/knowledgebase/advanced/direct-content) - Ingest raw text without files
* [Vector Search Tuning](/concepts/knowledgebase/advanced/vector-search-params) - Fine-tune retrieval per Task
* [Document Management](/concepts/knowledgebase/advanced/document-management) - Add, remove, refresh documents dynamically
* [Storage Persistence](/concepts/knowledgebase/advanced/storage-persistence) - Persist document metadata to a storage backend
* [Isolated Search](/concepts/knowledgebase/advanced/isolate-search) - Scope queries to a single KnowledgeBase in a shared collection
