What is Unified OCR?
The Unified OCR system in Upsonic provides a consistent interface for optical character recognition across multiple OCR engines. Instead of learning different APIs for each OCR provider, you use a singleOCR class that works seamlessly with EasyOCR, RapidOCR, Tesseract, DeepSeek, and PaddleOCR.
The OCR class serves as a high-level orchestrator that:
- Manages multiple OCR provider backends with a unified API
- Handles image preprocessing (rotation correction, contrast enhancement, noise reduction)
- Converts PDFs to images with configurable DPI
- Tracks confidence scores and bounding box detection
- Collects performance metrics and processing statistics
- Provides provider-specific features and optimizations
How Unified OCR Works
The OCR system follows a clear processing pipeline:- File Preparation: Validates file existence and format (supports .png, .jpg, .jpeg, .bmp, .tiff, .tif, .webp, .pdf)
- PDF Conversion: If the file is a PDF, converts each page to images at the specified DPI
- Image Preprocessing: Optionally applies rotation correction, contrast enhancement, and noise reduction
- OCR Processing: Processes each image through the selected provider’s engine
- Result Aggregation: Combines results from multiple pages, calculates average confidence scores
- Metrics Tracking: Updates processing statistics for performance analysis

