Overview
In the Upsonic framework,OCR is a unified interface for optical character recognition that supports multiple OCR engines through a consistent API. It handles text extraction from images and PDFs with advanced preprocessing, multi-provider support, and comprehensive result tracking.
The OCR class serves as a high-level orchestrator that manages:
- Multiple OCR provider backends (EasyOCR, RapidOCR, Tesseract, DeepSeek, PaddleOCR)
- Image preprocessing (rotation correction, contrast enhancement, noise reduction)
- PDF to image conversion with configurable DPI
- Confidence scoring and bounding box detection
- Performance metrics and processing statistics
- Provider-specific features and optimizations
OCR Configuration
The OCR class provides comprehensive configuration options through theOCRConfig class to customize text extraction behavior.
Core Configuration
| Attribute | Type | Description |
|---|---|---|
| languages | List[str] | Languages to detect (default: [‘en’]) |
| confidence_threshold | float | Minimum confidence threshold (0.0-1.0, default: 0.0) |
| pdf_dpi | int | DPI for PDF rendering (default: 300) |
| preserve_formatting | bool | Try to preserve text formatting (default: True) |
Image Preprocessing
| Attribute | Type | Description |
|---|---|---|
| rotation_fix | bool | Enable automatic rotation correction (default: False) |
| enhance_contrast | bool | Enhance image contrast before OCR (default: False) |
| remove_noise | bool | Apply noise reduction (default: False) |
Supported OCR Providers
EasyOCR
Ready-to-use OCR with 80+ supported languages using deep learning models.RapidOCR
Lightweight OCR based on ONNX Runtime for fast inference.Tesseract
Google’s open-source OCR engine with 100+ language support.DeepSeek OCR
High-quality OCR using DeepSeek’s specialized model with vLLM.PaddleOCR
Comprehensive OCR with multiple specialized pipelines.Creating OCR Instances
Basic OCR Creation
OCR with Advanced Configuration
Text Extraction Methods
Simple Text Extraction
Detailed OCR Results
Metrics and Performance Tracking
OCR Metrics
Provider Information
Advanced Features
PaddleOCR Advanced Features
Structure Recognition
Practical Examples
Document Processing Pipeline
Multi-Language Document Extraction
Invoice Data Extraction
Research Paper to Markdown
Best Practices
Provider Selection
- EasyOCR: Best for multi-language support (80+ languages) with deep learning accuracy
- RapidOCR: Best for speed and lightweight deployment
- Tesseract: Best for traditional OCR with extensive language support (100+)
- DeepSeek: Best for complex layouts and high-accuracy requirements (requires GPU)
- PaddleOCR: Best for comprehensive document understanding with specialized pipelines
Performance Optimization
- Choose appropriate DPI: Use 200-300 DPI for PDFs (higher = slower but more accurate)
- Enable GPU acceleration: Use
gpu=Truefor EasyOCR when available - Set confidence thresholds: Filter low-quality results early to reduce noise
- Use batch processing: DeepSeek’s batch mode significantly improves multi-page PDF performance
- Disable unused preprocessing: Only enable rotation_fix, enhance_contrast, remove_noise when needed
Quality Optimization
- Enable preprocessing for low-quality images: rotation_fix, enhance_contrast, remove_noise
- Use multi-language support: Specify all expected languages for better detection
- Adjust confidence thresholds: Balance between accuracy and recall based on use case
- Validate results: Check confidence scores and manually review low-confidence blocks
- Choose specialized pipelines: Use PPStructureV3 for tables, PPChatOCRv4 for structured extraction

