Skip to main content

Overview

Agentic splitter uses AI agents to extract atomic propositions, group them into coherent topics, and create semantically meaningful chunks. Features comprehensive caching, quality validation, error handling with fallbacks, and rich metadata enrichment. Splitter Class: AgenticChunker Config Class: AgenticChunkingConfig

Dependencies

Requires a pre-configured Agent instance for cognitive processing.

Examples

from upsonic import Agent, Task, KnowledgeBase
from upsonic.loaders import TextLoader, TextLoaderConfig
from upsonic.embeddings import OpenAIEmbedding, OpenAIEmbeddingConfig
from upsonic.text_splitter import AgenticChunker, AgenticChunkingConfig
from upsonic.vectordb import ChromaProvider, ChromaConfig, ConnectionConfig, Mode

# Create agent for cognitive processing
agent = Agent("openai/gpt-4o")

# Configure splitter
splitter_config = AgenticChunkingConfig(
    chunk_size=512,
    chunk_overlap=50,
    max_agent_retries=3,
    enable_proposition_caching=True
)
splitter = AgenticChunker(agent, splitter_config)

# Setup KnowledgeBase
loader = TextLoader(TextLoaderConfig())
embedding = OpenAIEmbedding(OpenAIEmbeddingConfig())
vectordb = ChromaProvider(ChromaConfig(
    collection_name="agentic_docs",
    vector_size=1536,
    connection=ConnectionConfig(mode=Mode.IN_MEMORY)
))

kb = KnowledgeBase(
    sources=["document.txt"],
    embedding_provider=embedding,
    vectordb=vectordb,
    loaders=[loader],
    splitters=[splitter]
)

# Query with Agent
query_agent = Agent("openai/gpt-4o")
task = Task("What are the main propositions?", context=[kb])
result = query_agent.do(task)
print(result)

Parameters

ParameterTypeDescriptionDefaultSource
chunk_sizeintTarget size of each chunk1024Base
chunk_overlapintOverlapping units between chunks200Base
min_chunk_sizeint | NoneMinimum size for a chunkNoneBase
length_functionCallable[[str], int]Function to measure text lengthlenBase
strip_whitespaceboolStrip leading/trailing whitespaceFalseBase
max_agent_retriesintMaximum retries for agent calls3Specific
min_proposition_lengthintMinimum length for valid propositions20Specific
max_propositions_per_chunkintMaximum propositions in a chunk15Specific
min_propositions_per_chunkintMinimum propositions to form a chunk3Specific
enable_proposition_cachingboolCache proposition extraction resultsTrueSpecific
enable_topic_cachingboolCache topic assignment resultsTrueSpecific
enable_refinement_cachingboolCache topic refinement resultsTrueSpecific
enable_proposition_validationboolValidate proposition qualityTrueSpecific
enable_topic_optimizationboolOptimize topic assignmentsTrueSpecific
enable_coherence_scoringboolScore chunk coherenceTrueSpecific
fallback_to_recursiveboolFallback to recursive chunking on failureTrueSpecific
include_proposition_metadataboolInclude proposition-level metadataTrueSpecific
include_topic_scoresboolInclude topic coherence scoresTrueSpecific
include_agent_metadataboolInclude agent processing metadataTrueSpecific