Skip to main content

Overview

Python splitter uses the Abstract Syntax Tree (AST) to identify precise boundaries of logical blocks like classes and functions. Provides semantically meaningful chunking that’s robust to formatting variations. Each chunk includes metadata about the code structure. Splitter Class: PythonChunker Config Class: PythonChunkingConfig

Dependencies

No additional dependencies required. Uses standard library.

Examples

from upsonic import Agent, Task, KnowledgeBase
from upsonic.loaders import TextLoader, TextLoaderConfig
from upsonic.embeddings import OpenAIEmbedding, OpenAIEmbeddingConfig
from upsonic.text_splitter import PythonChunker, PythonChunkingConfig
from upsonic.vectordb import ChromaProvider, ChromaConfig, ConnectionConfig, Mode

# Configure splitter
splitter_config = PythonChunkingConfig(
    chunk_size=512,
    chunk_overlap=50,
    split_on_nodes=["ClassDef", "FunctionDef"],
    include_docstrings=True
)
splitter = PythonChunker(splitter_config)

# Setup KnowledgeBase
loader = TextLoader(TextLoaderConfig())
embedding = OpenAIEmbedding(OpenAIEmbeddingConfig())
vectordb = ChromaProvider(ChromaConfig(
    collection_name="python_code",
    vector_size=1536,
    connection=ConnectionConfig(mode=Mode.IN_MEMORY)
))

kb = KnowledgeBase(
    sources=["code.py"],
    embedding_provider=embedding,
    vectordb=vectordb,
    loaders=[loader],
    splitters=[splitter]
)

# Query with Agent
agent = Agent("openai/gpt-4o")
task = Task("Find all class definitions", context=[kb])
result = agent.do(task)
print(result)

Parameters

ParameterTypeDescriptionDefaultSource
chunk_sizeintTarget size of each chunk1024Base
chunk_overlapintOverlapping units between chunks200Base
min_chunk_sizeint | NoneMinimum size for a chunkNoneBase
length_functionCallable[[str], int]Function to measure text lengthlenBase
strip_whitespaceboolStrip leading/trailing whitespaceFalseBase
split_on_nodeslist[str]AST node types for boundaries["ClassDef", "FunctionDef", "AsyncFunctionDef"]Specific
min_chunk_linesintMinimum lines for standalone chunk1Specific
include_docstringsboolInclude docstrings in chunksTrueSpecific
strip_decoratorsboolStrip decorator syntaxFalseSpecific
text_chunker_to_useBaseChunkerChunker for oversized blocksRecursiveChunkerSpecific