Skip to main content

Overview

JSON loader processes JSON and JSONL files with support for single or multi-document extraction using JQ queries. Flexible content and metadata mapping for structured data. Loader Class: JSONLoader Config Class: JSONLoaderConfig

Dependencies

pip install "upsonic[loaders]"

Examples

from upsonic import Agent, Task, KnowledgeBase
from upsonic.loaders import JSONLoader, JSONLoaderConfig
from upsonic.embeddings import OpenAIEmbedding, OpenAIEmbeddingConfig
from upsonic.text_splitter import RecursiveChunker, RecursiveChunkingConfig
from upsonic.vectordb import ChromaProvider, ChromaConfig, ConnectionConfig, Mode

# Configure loader for multi-document extraction
loader_config = JSONLoaderConfig(
    mode="multi",
    record_selector=".articles[]",
    content_mapper=".title + ' ' + .body",
    metadata_mapper={"author": ".author", "date": ".published"}
)
loader = JSONLoader(loader_config)

# Setup KnowledgeBase
embedding = OpenAIEmbedding(OpenAIEmbeddingConfig())
chunker = RecursiveChunker(RecursiveChunkingConfig())
vectordb = ChromaProvider(ChromaConfig(
    collection_name="json_data",
    vector_size=1536,
    connection=ConnectionConfig(mode=Mode.IN_MEMORY)
))

kb = KnowledgeBase(
    sources=["articles.json"],
    embedding_provider=embedding,
    vectordb=vectordb,
    loaders=[loader],
    splitters=[chunker]
)

# Query with Agent
agent = Agent("openai/gpt-4o")
task = Task("Find articles about AI", context=[kb])
result = agent.do(task)
print(result)

Parameters

ParameterTypeDescriptionDefaultSource
encodingstr | NoneFile encoding (auto-detected if None)NoneBase
error_handling"ignore" | "warn" | "raise"How to handle loading errors”warn”Base
include_metadataboolWhether to include file metadataTrueBase
custom_metadatadictAdditional metadata to includeBase
max_file_sizeint | NoneMaximum file size in bytesNoneBase
skip_empty_contentboolSkip documents with empty contentTrueBase
mode"single" | "multi"Processing mode”single”Specific
record_selectorstr | NoneJQ query to select records (required for multi)NoneSpecific
content_mapperstrJQ query to extract content”.”Specific
metadata_mapperdict[str, str] | NoneMap metadata keys to JQ queriesNoneSpecific
content_synthesis_mode"json" | "text"Format for extracted content”json”Specific
json_linesboolFile is in JSON Lines formatFalseSpecific