What is Knowledeg Base?

The Knowledge Base feature in the Upsonic framework serves as a critical component that supplies essential task-related data to the LLM during its analysis and processing operations. By providing relevant information to either task-performing agents or direct LLM calls, this feature significantly enhances the probability of successful task completion.

A notable advantage of the Knowledge Base feature is its seamless integration with context compression, optimizing the handling of large datasets. The framework implements this functionality through two distinct types of knowledge bases 

  • The one that directly provides the data that needs to be kept and analyzed alongside each prompt.
  • Knowledge bases that exist as tools that the LLM can search when needed.

While both groups have their own benefits, especially in cases requiring RAG (Retrieval-Augmented Generation), and when dealing with large or unstructured data, the LLM (Large Language Model) will return more successful results since it determines the search query by itself.

Key benefits of using Knowledge:

  • Enhance agents for domain-specific situations
  • Enhance decisions with real-world data

Creating an Knowledge Base

To create a knowledge base, it’s sufficient to directly import the KnowledgeBase class. Then, we will create an object belonging to this class and add it to the context list within the Task. KnowledgeBase works compatibly with every place where context can be used.

from upsonic import KnowledgeBase

resume_list = KnowledgeBase(
  sources=["resume1.pdf", "resume2.pdf"]
)

For file-Based Knowledge Sources, make sure to place your files in a current work directory. Also, you can use full path.

Put a Knowledge Base into to Task

You can easily add the KnowledgeBase object you created to any task’s context list. The Agent or LLM will evaluate these contexts while performing operations.

from upsonic import Task

task1 = Task(
  "Summerize the resumes",

  context=[resume_list] # Adding Knowledge Base to the task
)

Enabling RAG

When dealing with a large volume of sources, having the agent directly process all sources in raw format both weakens its focus and may hit context limits. In such cases, using RAG (Retrieval-Augmented Generation), which is a mechanism that retrieves only relevant content, is highly beneficial. To implement RAG within Upsonic, simply specifying the rag_model is sufficient. This ensures that the most relevant data is retrieved for the prompt during each agent execution or direct LLM call.

**Supported Embedding Models:

  • openai/text-embedding-3-small

Installing RAG Library

Upsonic framework have an extra package group for RAG based cases. You can install via:

pip install 'upsonic[rag]'

RAG Is only supported in 3.13 < Python >3.10

from upsonic import KnowledgeBase

resume_list = KnowledgeBase(
  sources=["resume1.pdf", "resume2.pdf"],

  rag_model="openai/text-embedding-3-small" # Specifying Model to enable rag
)

Supported Formats

Supported Files

    • PDF
    • PowerPoint
    • Word
    • Excel
    • Images
    • Audio
    • HTML
    • CSV
    • JSON
    • XML
    • ZIP

Supported Variables

    • STRING