Overview
What is Unified OCR?
The Unified OCR system in Upsonic provides a consistent interface for optical character recognition across multiple OCR engines. Instead of learning different APIs for each OCR provider, you use a singleOCR class that works seamlessly with EasyOCR, RapidOCR, Tesseract, DeepSeek, and PaddleOCR.
The OCR class serves as a high-level orchestrator that:
- Manages multiple OCR provider backends with a unified API
- Handles image preprocessing (rotation correction, contrast enhancement, noise reduction)
- Converts PDFs to images with configurable DPI
- Tracks confidence scores and bounding box detection
- Collects performance metrics and processing statistics
- Provides provider-specific features and optimizations
How Unified OCR Works
The OCR system follows a clear processing pipeline:- File Preparation: Validates file existence and format (supports .png, .jpg, .jpeg, .bmp, .tiff, .tif, .webp, .pdf)
- PDF Conversion: If the file is a PDF, converts each page to images at the specified DPI
- Image Preprocessing: Optionally applies rotation correction, contrast enhancement, and noise reduction
- OCR Processing: Processes each image through the selected provider’s engine
- Result Aggregation: Combines results from multiple pages, calculates average confidence scores
- Metrics Tracking: Updates processing statistics for performance analysis
Attributes
The OCR system is configured throughOCRConfig, which provides the following attributes:
| Attribute | Type | Default | Description |
|---|---|---|---|
languages | List[str] | ['en'] | Languages to detect (e.g., [‘en’, ‘zh’, ‘ja’]) |
confidence_threshold | float | 0.0 | Minimum confidence threshold (0.0-1.0) for accepting OCR results |
rotation_fix | bool | False | Enable automatic rotation correction for skewed images |
enhance_contrast | bool | False | Enhance image contrast before OCR processing |
remove_noise | bool | False | Apply noise reduction filter to improve text clarity |
pdf_dpi | int | 300 | DPI resolution for PDF rendering (higher = better quality, slower) |
preserve_formatting | bool | True | Try to preserve text formatting (line breaks, spacing) |
Providers
EasyOCR
Ready-to-use OCR with 80+ supported languages using deep learning models. Best for multi-language support with high accuracy. Usage:| Parameter | Type | Default | Description |
|---|---|---|---|
languages | List[str] | ['en'] | List of language codes to detect |
gpu | bool | False | Enable GPU acceleration for faster processing |
rotation_fix | bool | False | Auto-detect and fix image rotation |
enhance_contrast | bool | False | Enhance image contrast |
remove_noise | bool | False | Apply noise reduction |
confidence_threshold | float | 0.0 | Minimum confidence for text blocks |
paragraph | bool | False | Group text into paragraphs |
min_size | int | 10 | Minimum text region size |
text_threshold | float | 0.7 | Text detection threshold |
RapidOCR
Lightweight OCR based on ONNX Runtime for fast inference. Best for speed and lightweight deployment. Usage:| Parameter | Type | Default | Description |
|---|---|---|---|
languages | List[str] | ['en'] | List of language codes (primarily ‘en’ and ‘ch’) |
confidence_threshold | float | 0.0 | Minimum confidence for text blocks |
rotation_fix | bool | False | Auto-detect and fix image rotation |
enhance_contrast | bool | False | Enhance image contrast |
remove_noise | bool | False | Apply noise reduction |
pdf_dpi | int | 300 | DPI for PDF rendering |
Tesseract
Google’s open-source OCR engine with 100+ language support. Best for traditional OCR with extensive language coverage. Usage:| Parameter | Type | Default | Description |
|---|---|---|---|
languages | List[str] | ['eng'] | List of Tesseract language codes |
tesseract_cmd | str | None | Path to tesseract executable |
confidence_threshold | float | 0.0 | Minimum confidence for text blocks |
rotation_fix | bool | False | Auto-detect and fix image rotation |
enhance_contrast | bool | False | Enhance image contrast |
remove_noise | bool | False | Apply noise reduction |
preserve_formatting | bool | True | Preserve text layout and formatting |
psm | int | 3 | Page segmentation mode (0-13) |
oem | int | 3 | OCR Engine Mode (0-3) |
custom_config | str | '' | Additional Tesseract configuration string |
- Ubuntu/Debian:
sudo apt-get install tesseract-ocr - macOS:
brew install tesseract - Windows: Download installer from GitHub
PaddleOCR
Comprehensive OCR with multiple specialized pipelines for advanced document understanding. Usage:| Parameter | Type | Default | Description |
|---|---|---|---|
lang | str | 'en' | Language code |
ocr_version | str | 'PP-OCRv5' | OCR version (‘PP-OCRv3’, ‘PP-OCRv4’, ‘PP-OCRv5’) |
use_doc_orientation_classify | bool | None | Enable document orientation classification |
use_doc_unwarping | bool | None | Enable document unwarping |
use_textline_orientation | bool | None | Enable text line orientation detection |
text_det_limit_side_len | int | None | Limit on detection input side length |
text_rec_score_thresh | float | None | Text recognition score threshold |
return_word_box | bool | None | Return word-level bounding boxes |
| Parameter | Type | Default | Description |
|---|---|---|---|
use_table_recognition | bool | None | Enable table recognition |
use_formula_recognition | bool | None | Enable formula recognition |
use_seal_recognition | bool | None | Enable seal text recognition |
use_chart_recognition | bool | None | Enable chart recognition |
layout_threshold | float | None | Layout detection score threshold |
lang | str | 'en' | Language code |
| Parameter | Type | Default | Description |
|---|---|---|---|
use_table_recognition | bool | None | Enable table recognition |
use_seal_recognition | bool | None | Enable seal recognition |
mllm_chat_bot_config | dict | None | Multimodal LLM configuration |
retriever_config | dict | None | Retriever configuration for vector search |
| Parameter | Type | Default | Description |
|---|---|---|---|
use_layout_detection | bool | None | Enable layout detection |
use_chart_recognition | bool | None | Enable chart recognition |
format_block_content | bool | None | Format content as Markdown |
vl_rec_backend | str | 'local' | VL recognition backend |
temperature | float | None | Sampling temperature for VLM |
Metrics and Performance
Enabling Metrics
The OCR system automatically tracks metrics for all operations. Metrics include files processed, pages, characters, confidence scores, and processing time.Analyzing Performance
Use metrics to analyze and optimize OCR performance across different providers and configurations.Advanced Features
Provider Selection Helper
Use theinfer_provider function to create OCR instances by provider name without importing provider classes.

