> ## Documentation Index
> Fetch the complete documentation index at: https://docs.upsonic.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Python Splitter

> Split Python code using AST-powered semantic boundaries

## Overview

Python splitter uses the Abstract Syntax Tree (AST) to identify precise boundaries of logical blocks like classes and functions. Provides semantically meaningful chunking that's robust to formatting variations. Each chunk includes metadata about the code structure.

**Splitter Class:** `PythonChunker`

**Config Class:** `PythonChunkingConfig`

## Dependencies

No additional dependencies required. Uses standard library.

## Examples

```python theme={null}
from upsonic import Agent, Task, KnowledgeBase
from upsonic.loaders.text import TextLoader
from upsonic.loaders.config import TextLoaderConfig
from upsonic.embeddings import OpenAIEmbedding, OpenAIEmbeddingConfig
from upsonic.text_splitter.python import PythonChunker, PythonChunkingConfig
from upsonic.vectordb import ChromaProvider, ChromaConfig, ConnectionConfig, Mode

# Configure splitter
splitter_config = PythonChunkingConfig(
    chunk_size=512,
    chunk_overlap=50,
    split_on_nodes=["ClassDef", "FunctionDef"],
    include_docstrings=True
)
splitter = PythonChunker(splitter_config)

# Setup KnowledgeBase
loader = TextLoader(TextLoaderConfig())
embedding = OpenAIEmbedding(OpenAIEmbeddingConfig())
vectordb = ChromaProvider(ChromaConfig(
    collection_name="python_code",
    vector_size=1536,
    connection=ConnectionConfig(mode=Mode.IN_MEMORY)
))

kb = KnowledgeBase(
    sources=["code.py"],
    embedding_provider=embedding,
    vectordb=vectordb,
    loaders=[loader],
    splitters=[splitter]
)

# Query with Agent
agent = Agent("anthropic/claude-sonnet-4-5")
task = Task("Find all class definitions", context=[kb])
result = agent.do(task)
print(result)
```

## Parameters

| Parameter             | Type                   | Description                        | Default                                           | Source   |
| --------------------- | ---------------------- | ---------------------------------- | ------------------------------------------------- | -------- |
| `chunk_size`          | `int`                  | Target size of each chunk          | 1024                                              | Base     |
| `chunk_overlap`       | `int`                  | Overlapping units between chunks   | 200                                               | Base     |
| `min_chunk_size`      | `int \| None`          | Minimum size for a chunk           | None                                              | Base     |
| `length_function`     | `Callable[[str], int]` | Function to measure text length    | `len`                                             | Base     |
| `strip_whitespace`    | `bool`                 | Strip leading/trailing whitespace  | False                                             | Base     |
| `split_on_nodes`      | `list[str]`            | AST node types for boundaries      | `["ClassDef", "FunctionDef", "AsyncFunctionDef"]` | Specific |
| `min_chunk_lines`     | `int`                  | Minimum lines for standalone chunk | 1                                                 | Specific |
| `include_docstrings`  | `bool`                 | Include docstrings in chunks       | True                                              | Specific |
| `strip_decorators`    | `bool`                 | Strip decorator syntax             | False                                             | Specific |
| `text_chunker_to_use` | `BaseChunker`          | Chunker for oversized blocks       | RecursiveChunker                                  | Specific |
