OCR

What is Unified OCR?

The Unified OCR system in Upsonic provides a consistent interface for optical character recognition across multiple OCR engines. Instead of learning different APIs for each OCR provider, you use a single OCR class that works seamlessly with EasyOCR, RapidOCR, Tesseract, DeepSeek, and PaddleOCR. The OCR class serves as a high-level orchestrator that:

Manages multiple OCR provider backends with a unified API
Handles image preprocessing (rotation correction, contrast enhancement, noise reduction)
Converts PDFs to images with configurable DPI
Tracks confidence scores and bounding box detection
Collects performance metrics and processing statistics
Provides provider-specific features and optimizations

OCR Installation

pip install upsonic[ocr]

This installs Upsonic with OCR dependencies including EasyOCR, RapidOCR, Tesseract, PaddleOCR, and image processing libraries. You’ll have access to all OCR providers through a unified interface without needing to configure each one separately.

from upsonic.ocr import OCR
from upsonic.ocr.easyocr import EasyOCR

# Create OCR instance
ocr = OCR(EasyOCR, languages=['en'], rotation_fix=True)

# Extract text
text = ocr.get_text('document.pdf')
print(text)

How Unified OCR Works

The OCR system follows a clear processing pipeline:

File Preparation: Validates file existence and format (supports .png, .jpg, .jpeg, .bmp, .tiff, .tif, .webp, .pdf)
PDF Conversion: If the file is a PDF, converts each page to images at the specified DPI
Image Preprocessing: Optionally applies rotation correction, contrast enhancement, and noise reduction
OCR Processing: Processes each image through the selected provider’s engine
Result Aggregation: Combines results from multiple pages, calculates average confidence scores
Metrics Tracking: Updates processing statistics for performance analysis

from upsonic.ocr import OCR
from upsonic.ocr.rapidocr import RapidOCR

# Create OCR with preprocessing
ocr = OCR(
    RapidOCR,
    languages=['en'],
    rotation_fix=True,
    enhance_contrast=True,
    pdf_dpi=300
)

# Process file - returns detailed results
result = ocr.process_file('document.pdf')

print(f"Text: {result.text}")
print(f"Confidence: {result.confidence:.2%}")
print(f"Pages: {result.page_count}")
print(f"Processing time: {result.processing_time_ms:.2f}ms")

OCR Attributes - Comprehensive guide to all OCR configuration options
OCR Providers - Configure EasyOCR, RapidOCR, Tesseract, PaddleOCR, and DeepSeek OCR
Advanced Features - Advanced preprocessing and optimization options
Metrics and Performance - Monitor and optimize OCR performance
Basic OCR Example - Get started with OCR integration

Made with Love 💚

We believe that document processing should be simple and accessible to everyone. By unifying multiple OCR engines under one interface, we’re giving developers the freedom to choose the best tool for their needs without rewriting code. Whether you’re building an invoice processor or analyzing historical documents, we’ve built this with care so you can focus on what matters most - your application.

GET STARTED

CONCEPTS

STARTING AN AGENT PROJECT

DEPLOYMENT

FURTHER READINGS

What is Unified OCR?

How Unified OCR Works

Navigation

Made with Love 💚

GET STARTED

CONCEPTS

STARTING AN AGENT PROJECT

DEPLOYMENT

FURTHER READINGS

​What is Unified OCR?

​How Unified OCR Works

​Navigation

​Made with Love 💚

What is Unified OCR?

How Unified OCR Works

Navigation

Made with Love 💚