Skip to main content

Overview

XML loader processes XML files using XPath expressions to split documents and extract content. Supports namespace handling, attribute extraction, and flexible content synthesis modes. Loader Class: XMLLoader Config Class: XMLLoaderConfig

Dependencies

pip install "upsonic[loaders]"

Examples

from upsonic import Agent, Task, KnowledgeBase
from upsonic.loaders import XMLLoader, XMLLoaderConfig
from upsonic.embeddings import OpenAIEmbedding, OpenAIEmbeddingConfig
from upsonic.text_splitter import RecursiveChunker, RecursiveChunkingConfig
from upsonic.vectordb import ChromaProvider, ChromaConfig, ConnectionConfig, Mode

# Configure loader
loader_config = XMLLoaderConfig(
    split_by_xpath="//item",
    content_xpath="./description",
    metadata_xpaths={"title": "./title", "author": "./author"}
)
loader = XMLLoader(loader_config)

# Setup KnowledgeBase
embedding = OpenAIEmbedding(OpenAIEmbeddingConfig())
chunker = RecursiveChunker(RecursiveChunkingConfig())
vectordb = ChromaProvider(ChromaConfig(
    collection_name="xml_data",
    vector_size=1536,
    connection=ConnectionConfig(mode=Mode.IN_MEMORY)
))

kb = KnowledgeBase(
    sources=["data.xml"],
    embedding_provider=embedding,
    vectordb=vectordb,
    loaders=[loader],
    splitters=[chunker]
)

# Query with Agent
agent = Agent("openai/gpt-4o")
task = Task("Find items matching 'technology'", context=[kb])
result = agent.do(task)
print(result)

Parameters

ParameterTypeDescriptionDefaultSource
encodingstr | NoneFile encoding (auto-detected if None)NoneBase
error_handling"ignore" | "warn" | "raise"How to handle loading errors”warn”Base
include_metadataboolWhether to include file metadataTrueBase
custom_metadatadictAdditional metadata to includeBase
max_file_sizeint | NoneMaximum file size in bytesNoneBase
skip_empty_contentboolSkip documents with empty contentTrueBase
split_by_xpathstrXPath expression to identify document elements"//*[not(*)] | //item | //product | //book"Specific
content_xpathstr | NoneRelative XPath to select contentNoneSpecific
content_synthesis_mode"smart_text" | "xml_snippet"Content format”smart_text”Specific
include_attributesboolInclude element attributes in metadataTrueSpecific
metadata_xpathsdict[str, str] | NoneMap metadata keys to XPath expressionsNoneSpecific
strip_namespacesboolRemove XML namespacesTrueSpecific
recover_modeboolParse malformed XMLFalseSpecific