XML Loader

Overview

XML loader processes XML files using XPath expressions to split documents and extract content. Supports namespace handling, attribute extraction, and flexible content synthesis modes. Loader Class: XMLLoader Config Class: XMLLoaderConfig

Install

Install the XML loader optional dependency group:

uv pip install "upsonic[xml-loader]"

Examples

from upsonic import Agent, Task, KnowledgeBase
from upsonic.loaders.xml import XMLLoader
from upsonic.loaders.config import XMLLoaderConfig
from upsonic.embeddings import OpenAIEmbedding, OpenAIEmbeddingConfig
from upsonic.text_splitter.recursive import RecursiveChunker, RecursiveChunkingConfig
from upsonic.vectordb import ChromaProvider, ChromaConfig, ConnectionConfig, Mode

# Configure loader
loader_config = XMLLoaderConfig(
    split_by_xpath="//item",
    content_xpath="./description",
    metadata_xpaths={"title": "./title", "author": "./author"}
)
loader = XMLLoader(loader_config)

# Setup KnowledgeBase
embedding = OpenAIEmbedding(OpenAIEmbeddingConfig())
chunker = RecursiveChunker(RecursiveChunkingConfig())
vectordb = ChromaProvider(ChromaConfig(
    collection_name="xml_data",
    vector_size=1536,
    connection=ConnectionConfig(mode=Mode.IN_MEMORY)
))

kb = KnowledgeBase(
    sources=["data.xml"],
    embedding_provider=embedding,
    vectordb=vectordb,
    loaders=[loader],
    splitters=[chunker]
)

# Query with Agent
agent = Agent("anthropic/claude-sonnet-4-5")
task = Task("Find items matching 'technology'", context=[kb])
result = agent.do(task)
print(result)

Parameters

Parameter	Type	Description	Default	Source
`encoding`	`str \| None`	File encoding (auto-detected if None)	None	Base
`error_handling`	`"ignore" \| "warn" \| "raise"`	How to handle loading errors	”warn”	Base
`include_metadata`	`bool`	Whether to include file metadata	True	Base
`custom_metadata`	`dict`	Additional metadata to include	Base
`max_file_size`	`int \| None`	Maximum file size in bytes	None	Base
`skip_empty_content`	`bool`	Skip documents with empty content	True	Base
`split_by_xpath`	`str`	XPath expression to identify document elements	`"//[not()] \| //item \| //product \| //book"`	Specific
`content_xpath`	`str \| None`	Relative XPath to select content	None	Specific
`content_synthesis_mode`	`"smart_text" \| "xml_snippet"`	Content format	”smart_text”	Specific
`include_attributes`	`bool`	Include element attributes in metadata	True	Specific
`metadata_xpaths`	`dict[str, str] \| None`	Map metadata keys to XPath expressions	None	Specific
`strip_namespaces`	`bool`	Remove XML namespaces	True	Specific
`recover_mode`	`bool`	Parse malformed XML	False	Specific

GET STARTED

CONCEPTS

STARTING AN AGENT PROJECT

READY TO USE SNIPPETS

DEPLOYMENT

FURTHER READINGS

Overview

Install

Examples

Parameters

GET STARTED

CONCEPTS

STARTING AN AGENT PROJECT

READY TO USE SNIPPETS

DEPLOYMENT

FURTHER READINGS

​Overview

​Install

​Examples

​Parameters

Overview

Install

Examples

Parameters