Skip to main content

Overview

JSON splitter operates on parsed JSON data, traversing the JSON graph to create chunks that are valid, self-contained JSON objects. Provides path-aware traceability by adding JSON paths to chunk metadata. Falls back to text chunking if JSON parsing fails. Splitter Class: JSONChunker Config Class: JSONChunkingConfig

Dependencies

No additional dependencies required. Uses standard library.

Examples

from upsonic import Agent, Task, KnowledgeBase
from upsonic.loaders import JSONLoader, JSONLoaderConfig
from upsonic.embeddings import OpenAIEmbedding, OpenAIEmbeddingConfig
from upsonic.text_splitter import JSONChunker, JSONChunkingConfig
from upsonic.vectordb import ChromaProvider, ChromaConfig, ConnectionConfig, Mode

# Configure splitter
splitter_config = JSONChunkingConfig(
    chunk_size=512,
    chunk_overlap=50,
    convert_lists_to_dicts=True,
    max_depth=50
)
splitter = JSONChunker(splitter_config)

# Setup KnowledgeBase
loader = JSONLoader(JSONLoaderConfig())
embedding = OpenAIEmbedding(OpenAIEmbeddingConfig())
vectordb = ChromaProvider(ChromaConfig(
    collection_name="json_docs",
    vector_size=1536,
    connection=ConnectionConfig(mode=Mode.IN_MEMORY)
))

kb = KnowledgeBase(
    sources=["data.json"],
    embedding_provider=embedding,
    vectordb=vectordb,
    loaders=[loader],
    splitters=[splitter]
)

# Query with Agent
agent = Agent("openai/gpt-4o")
task = Task("Find all user records", context=[kb])
result = agent.do(task)
print(result)

Parameters

ParameterTypeDescriptionDefaultSource
chunk_sizeintTarget size of each chunk1024Base
chunk_overlapintOverlapping units between chunks200Base
min_chunk_sizeint | NoneMinimum size for a chunkNoneBase
length_functionCallable[[str], int]Function to measure text lengthlenBase
strip_whitespaceboolStrip leading/trailing whitespaceFalseBase
convert_lists_to_dictsboolConvert lists to dict-like objectsTrueSpecific
max_depthint | NoneMaximum recursion depth50Specific
json_encoder_optionsdictOptions for json.dumpsSpecific