Skip to main content

Parameters

ParameterTypeDefaultDescription
sourcesUnion[str, Path, List[Union[str, Path]]]RequiredSource identifiers (file path, list of files, directory path, or string content)
embedding_providerEmbeddingProviderRequiredAn instance of a concrete EmbeddingProvider
vectordbBaseVectorDBProviderRequiredAn instance of a concrete BaseVectorDBProvider
splittersOptional[Union[BaseChunker, List[BaseChunker]]]NoneA single BaseChunker or list of BaseChunker instances
loadersOptional[Union[BaseLoader, List[BaseLoader]]]NoneA single BaseLoader or list of BaseLoader instances for different file types
nameOptional[str]NoneAn optional human-readable name for this knowledge base
use_casestr"rag_retrieval"The intended use case for chunking optimization
quality_preferencestr"balanced"Speed vs quality preference (“fast”, “balanced”, “quality”)
loader_configOptional[Dict[str, Any]]NoneConfiguration options specifically for loaders
splitter_configOptional[Dict[str, Any]]NoneConfiguration options specifically for splitters

Functions

setup_async

The main just-in-time engine for processing and indexing knowledge. This method is idempotent. It checks if the knowledge has already been processed and indexed. If so, it does nothing. If not, it executes the full data pipeline: Load -> Chunk -> Embed -> Store. A lock is used to prevent race conditions in concurrent environments. Now supports indexed processing where each source uses its corresponding loader and splitter. Returns:
  • None

query_async

Performs a similarity search to retrieve relevant knowledge. This is the primary retrieval method. It automatically triggers the setup process if it hasn’t been run yet. It then embeds the user’s query and searches the vector database for the most relevant chunks of text. Parameters:
  • query (str): The user’s query string
Returns:
  • List[RAGSearchResult]: A list of RAGSearchResult objects, where each contains the text content and metadata of a retrieved chunk

setup_rag

Setup RAG functionality for the knowledge base. This method is called by the context manager when RAG is enabled. Returns:
  • None

markdown

Return a markdown representation of the knowledge base. Used when RAG is disabled. Returns:
  • str: Markdown representation of the knowledge base

get_config_summary

Get a comprehensive summary of the KnowledgeBase configuration. Returns:
  • Dict[str, Any]: Dictionary containing configuration details of all components

health_check_async

Perform a comprehensive health check of the KnowledgeBase. Returns:
  • Dict[str, Any]: Dictionary containing health status and diagnostic information

get_collection_info_async

Get detailed information about the vector database collection. Returns:
  • Dict[str, Any]: Dictionary containing collection metadata and statistics

close

Clean up resources and close connections. This method should be called when the KnowledgeBase is no longer needed to prevent resource leaks. Returns:
  • None