Overview
The Document Analyzer showcases Upsonic’s ability to process visual documents and extract structured information. In this example, the agent:- Processes a Turkish Tax Certificate image
- Extracts the company name from the “TICARET UNVANI” field
- Returns structured data using Pydantic models
Key Features
- Multimodal Processing: Handles both text and image inputs
- Structured Output: Uses Pydantic models for type-safe responses
- Document OCR: Automatically extracts text from images
- Precise Extraction: Focuses on specific document fields
- Error Handling: Robust processing of document variations
Code Structure
Response Model
Agent Setup
Task Definition
Complete Implementation
How It Works
- Document Input: The agent receives a Turkish Tax Certificate image
- OCR Processing: Upsonic automatically extracts text from the image
- Field Identification: The LLM identifies the “TICARET UNVANI” field
- Name Extraction: Extracts the company name exactly as it appears
- Structured Output: Returns the result in a structured Pydantic model
Usage
Setup
Run the example
Expected Output
File Structure
Use Cases
- Document Processing: Extract information from official documents
- Form Processing: Automate data extraction from forms and certificates
- Compliance: Process regulatory documents and certificates
- Data Entry: Automate manual data extraction tasks
- Multilingual Documents: Handle documents in various languages
Advanced Features
Multiple Document Types
You can extend this example to handle various document types:Custom Field Extraction
Notes
- Tested with: upsonic==0.61.1a1758720414
- Image Formats: Supports PNG, JPG, PDF, and other common formats
- OCR Quality: Results depend on image quality and text clarity
- Language Support: Works with documents in various languages
- Error Handling: Gracefully handles unclear or damaged documents