What is Knowledge Base?

The Knowledge Base feature in the Upsonic framework serves as a critical component that supplies essential task-related data to the LLM during its analysis and processing operations. By providing relevant information to either task-performing agents or direct LLM calls, this feature significantly enhances the probability of successful task completion.

A notable advantage of the Knowledge Base feature is its seamless integration with context compression, optimizing the handling of large datasets. The framework implements this functionality through two distinct types of knowledge bases 

  • The one that directly provides the data that needs to be kept and analyzed alongside each prompt.
  • Knowledge bases that exist as tools that the LLM can search when needed.

While both groups have their own benefits, especially in cases requiring RAG (Retrieval-Augmented Generation), and when dealing with large or unstructured data, the LLM (Large Language Model) will return more successful results since it determines the search query by itself.

Key benefits of using Knowledge:

  • Enhance agents for domain-specific situations
  • Enhance decisions with real-world data

Creating an Knowledge Base

To create a knowledge base, it’s sufficient to directly import the KnowledgeBase class. Then, we will create an object belonging to this class and add it to the context list within the Task. KnowledgeBase works compatibly with every place where context can be used.

from upsonic import Task, Agent
from upsonic import KnowledgeBase

# Creating Knowledge Base
resume_list = KnowledgeBase(
  sources=["resume1.pdf", "resume2.pdf"]
)

# Creating Task
task = Task(
  "Summerize the Resumes", 
  context=[resume_list] # Adding Knowledge Base to the task
)

# Creating Agent
agent = Agent(name="Head of Development")

# Running the task
agent.print_do(task)

For file-Based Knowledge Sources, make sure to place your files in a current work directory. Also, you can use full path.

Enabling RAG

When dealing with a large volume of sources, having the agent directly process all sources in raw format both weakens its focus and may hit context limits. In such cases, using RAG (Retrieval-Augmented Generation), which is a mechanism that retrieves only relevant content, is highly beneficial. To implement RAG within Upsonic, simply specifying the rag_model is sufficient. This ensures that the most relevant data is retrieved for the prompt during each agent execution or direct LLM call.

**Supported Embedding Models:

  • openai/text-embedding-3-small

Installing RAG Library

Upsonic framework have an extra package group for RAG based cases. You can install via:

pip install 'upsonic[rag]'

RAG Is only supported in 3.13 < Python >3.10

from upsonic import KnowledgeBase

resume_list = KnowledgeBase(
  sources=["resume1.pdf", "resume2.pdf"],

  rag_model="openai/text-embedding-3-small" # Specifying Model to enable rag
)

Supported Formats

Supported Files

    • URL
    • PDF
    • PowerPoint
    • Word
    • Excel
    • Images
    • Audio
    • HTML
    • CSV
    • JSON
    • XML
    • ZIP

Supported Variables

    • STRING