Document Analyzer - Upsonic AI

This example demonstrates how to use the Upsonic framework to extract the company name from a Turkish Tax Certificate (Vergi Levhası) using computer vision and LLM reasoning.

Overview

The Document Analyzer showcases Upsonic’s ability to process visual documents and extract structured information. In this example, the agent:

Processes a Turkish Tax Certificate image
Extracts the company name from the “TICARET UNVANI” field
Returns structured data using Pydantic models

This demonstrates how Upsonic can handle multimodal inputs (text + images) and provide reliable document processing capabilities.

Key Features

Multimodal Processing: Handles both text and image inputs
Structured Output: Uses Pydantic models for type-safe responses
Document OCR: Automatically extracts text from images
Precise Extraction: Focuses on specific document fields
Error Handling: Robust processing of document variations

Code Structure

Response Model

class CompanyResponse(BaseModel):
    company_name: str

Agent Setup

doc_agent = Agent(name="document_reader")

Task Definition

task = Task(
    description=(
        "Read the attached Turkish tax certificate (Vergi Levhası) and return the company name "
        "exactly as it appears in the field 'TICARET UNVANI'. "
        "Do not invent, shorten, or replace words. "
        "Return only the full legal company name, nothing else."
    ),
    attachements=["task_examples/document_analyzer/assets/vergi_levhasi.png"],
    response_format=CompanyResponse
)

Complete Implementation

# task_examples/document_analyzer/extract_company_name.py

from upsonic import Task, Agent
from pydantic import BaseModel

# Define the response format
class CompanyResponse(BaseModel):
    company_name: str

# Create the agent
doc_agent = Agent(name="document_reader")

# Create the task
task = Task(
    description=(
        "Read the attached Turkish tax certificate (Vergi Levhası) and return the company name "
        "exactly as it appears in the field 'TICARET UNVANI'. "
        "Do not invent, shorten, or replace words. "
        "Return only the full legal company name, nothing else."
    ),
    attachements=["task_examples/document_analyzer/assets/vergi_levhasi.png"],
    response_format=CompanyResponse
)

# Run the task
result = doc_agent.do(task)

# Print the result
print("Extracted Company Name:", result.company_name)

How It Works

Document Input: The agent receives a Turkish Tax Certificate image
OCR Processing: Upsonic automatically extracts text from the image
Field Identification: The LLM identifies the “TICARET UNVANI” field
Name Extraction: Extracts the company name exactly as it appears
Structured Output: Returns the result in a structured Pydantic model

Usage

Setup

uv sync

Run the example

python task_examples/document_analyzer/extract_company_name.py

Expected Output

Extracted Company Name: UPSONIC TEKNOLOJİ ANONİM ŞİRKETİ

Note: Output may vary slightly depending on Upsonic version and OCR results.

File Structure

task_examples/document_analyzer/
├── extract_company_name.py      # Main document analysis script
├── assets/
│   └── vergi_levhasi.png       # Sample tax certificate
└── README.md                    # Documentation

Use Cases

Document Processing: Extract information from official documents
Form Processing: Automate data extraction from forms and certificates
Compliance: Process regulatory documents and certificates
Data Entry: Automate manual data extraction tasks
Multilingual Documents: Handle documents in various languages

Advanced Features

Multiple Document Types

You can extend this example to handle various document types:

# Process multiple document types
documents = [
    "invoice.pdf",
    "contract.docx", 
    "certificate.png"
]

for doc in documents:
    task = Task(
        description=f"Extract key information from {doc}",
        attachments=[doc],
        response_format=DocumentInfo
    )
    result = agent.do(task)

Custom Field Extraction

class DocumentFields(BaseModel):
    company_name: str
    tax_number: str
    address: str
    registration_date: str

task = Task(
    description="Extract all key fields from the tax certificate",
    attachments=["vergi_levhasi.png"],
    response_format=DocumentFields
)

Notes

Tested with: upsonic==0.61.1a1758720414
Image Formats: Supports PNG, JPG, PDF, and other common formats
OCR Quality: Results depend on image quality and text clarity
Language Support: Works with documents in various languages
Error Handling: Gracefully handles unclear or damaged documents

Repository

View the complete example: Document Analyzer Example

AI Agent Examples

​Overview

​Key Features

​Code Structure

​Response Model

​Agent Setup

​Task Definition

​Complete Implementation

​How It Works

​Usage

​Setup

​Run the example

​Expected Output

​File Structure

​Use Cases

​Advanced Features

​Multiple Document Types

​Custom Field Extraction

​Notes

​Repository