Skip to main content

Overview

YAML loader processes YAML files using jq-style queries to split documents and extract content. Supports multiple document files, metadata flattening, and flexible content synthesis modes. Loader Class: YAMLLoader Config Class: YAMLLoaderConfig

Dependencies

pip install "upsonic[loaders]"

Examples

from upsonic import Agent, Task, KnowledgeBase
from upsonic.loaders import YAMLLoader, YAMLLoaderConfig
from upsonic.embeddings import OpenAIEmbedding, OpenAIEmbeddingConfig
from upsonic.text_splitter import RecursiveChunker, RecursiveChunkingConfig
from upsonic.vectordb import ChromaProvider, ChromaConfig, ConnectionConfig, Mode

# Configure loader
loader_config = YAMLLoaderConfig(
    split_by_jq_query=".articles[]",
    content_synthesis_mode="smart_text",
    metadata_jq_queries={"author": ".author", "date": ".published"}
)
loader = YAMLLoader(loader_config)

# Setup KnowledgeBase
embedding = OpenAIEmbedding(OpenAIEmbeddingConfig())
chunker = RecursiveChunker(RecursiveChunkingConfig())
vectordb = ChromaProvider(ChromaConfig(
    collection_name="yaml_data",
    vector_size=1536,
    connection=ConnectionConfig(mode=Mode.IN_MEMORY)
))

kb = KnowledgeBase(
    sources=["data.yaml"],
    embedding_provider=embedding,
    vectordb=vectordb,
    loaders=[loader],
    splitters=[chunker]
)

# Query with Agent
agent = Agent("openai/gpt-4o")
task = Task("Find articles about machine learning", context=[kb])
result = agent.do(task)
print(result)

Parameters

ParameterTypeDescriptionDefaultSource
encodingstr | NoneFile encoding (auto-detected if None)NoneBase
error_handling"ignore" | "warn" | "raise"How to handle loading errors”warn”Base
include_metadataboolWhether to include file metadataTrueBase
custom_metadatadictAdditional metadata to includeBase
max_file_sizeint | NoneMaximum file size in bytesNoneBase
skip_empty_contentboolSkip documents with empty contentTrueBase
split_by_jq_querystrjq query to select document objects”.”Specific
handle_multiple_docsboolProcess multiple documents separated by ’---‘TrueSpecific
content_synthesis_mode"canonical_yaml" | "json" | "smart_text"Content format”canonical_yaml”Specific
yaml_indentintIndentation level for YAML output2Specific
json_indentint | NoneIndentation level for JSON output2Specific
flatten_metadataboolFlatten nested structure into metadataTrueSpecific
metadata_jq_queriesdict[str, str] | NoneMap metadata keys to jq queriesNoneSpecific