Skip to main content

Overview

By default, KnowledgeBase stores document chunks only in the vector database. When you pass a storage backend, KnowledgeBase also writes a document registry — a relational record of every document it has processed, including metadata, content hashes, chunk counts, and processing status. This is useful when you need to:
  • Track which documents are indexed across restarts without querying the vector database
  • Share a storage backend between Memory and KnowledgeBase for a unified persistence layer
  • Audit document lifecycle — see when documents were added, their status, and source paths
  • Enable source removal by document ID — storage lets remove_document() look up the original file path and clean up sources

Quick Start

Pass any Upsonic storage backend as the storage parameter:
from upsonic import KnowledgeBase
from upsonic.storage.sqlite import SqliteStorage
from upsonic.embeddings import OpenAIEmbedding, OpenAIEmbeddingConfig
from upsonic.vectordb import ChromaProvider, ChromaConfig, ConnectionConfig, Mode

# Reuse the same storage you use for Memory, or create a dedicated one
storage = SqliteStorage(db_file="app.db")

kb = KnowledgeBase(
    sources=["docs/"],
    vectordb=ChromaProvider(ChromaConfig(
        collection_name="my_kb",
        vector_size=1536,
        connection=ConnectionConfig(mode=Mode.EMBEDDED, db_path="./chroma_db")
    )),
    embedding_provider=OpenAIEmbedding(OpenAIEmbeddingConfig()),
    storage=storage,  # enables document registry persistence
)

kb.setup()
After setup(), every processed document is recorded in the storage’s knowledge table (upsonic_knowledge by default). When you call add_source(), add_text(), or remove_document(), the registry is updated automatically.

What Gets Persisted

Each processed document creates a row in the knowledge table:
FieldDescription
idDocument ID (content-based hash)
nameHuman-readable document name
typeFile extension (e.g., pdf, md)
sizeFile size in bytes
knowledge_base_idID of the parent KnowledgeBase
content_hashMD5 hash of document content for deduplication
chunk_countNumber of chunks created from this document
sourceOriginal file path
statusProcessing status (indexed, failed)
metadataFull document metadata as JSON
created_atTimestamp of first indexing
updated_atTimestamp of last update
See Storage Tables for the full schema.

Supported Backends

Any Upsonic storage backend works — the same ones used for Memory:
BackendExample
SqliteStorageSqliteStorage(db_file="app.db")
PostgresStoragePostgresStorage(db_url="postgresql://...")
RedisStorageRedisStorage(db_url="redis://...")
MongoStorageMongoStorage(db_url="mongodb://...")
JSONStorageJSONStorage(db_path="./data")
InMemoryStorageInMemoryStorage()
Mem0StorageMem0Storage(api_key="...")
Async storage backends (AsyncSqliteStorage, AsyncPostgresStorage, AsyncMongoStorage, AsyncMem0Storage) are also supported.

Sharing Storage with Memory

You can use the same storage instance for both Memory and KnowledgeBase. Each system writes to its own tables:
from upsonic import Agent, Task, KnowledgeBase
from upsonic.storage.sqlite import SqliteStorage
from upsonic.storage.memory import Memory
from upsonic.embeddings import OpenAIEmbedding, OpenAIEmbeddingConfig
from upsonic.vectordb import ChromaProvider, ChromaConfig, ConnectionConfig, Mode

# Single storage for everything
storage = SqliteStorage(db_file="app.db")

# KnowledgeBase uses the knowledge table
kb = KnowledgeBase(
    sources=["docs/"],
    vectordb=ChromaProvider(ChromaConfig(
        collection_name="my_kb",
        vector_size=1536,
        connection=ConnectionConfig(mode=Mode.EMBEDDED, db_path="./chroma_db")
    )),
    embedding_provider=OpenAIEmbedding(OpenAIEmbeddingConfig()),
    storage=storage,
)

# Memory uses the sessions and user_memory tables
memory = Memory(
    storage=storage,
    session_id="session_001",
    user_id="user_123",
    full_session_memory=True,
    model="anthropic/claude-sonnet-4-5"
)

agent = Agent("anthropic/claude-sonnet-4-5", memory=memory)
task = Task("Summarize the documentation", context=[kb])
result = agent.do(task)

Custom Table Name

Override the default knowledge table name via the storage constructor:
storage = SqliteStorage(
    db_file="app.db",
    knowledge_table="my_custom_knowledge_table"
)