Layered Architecture
The Unified OCR system is built on a layered architecture that separates concerns and enables flexible engine composition.Layer 0 — Document Preparation
Handles file validation, PDF-to-image conversion, and image preprocessing before OCR processing begins.- File validation: Checks file existence and supported formats (.png, .jpg, .jpeg, .bmp, .tiff, .tif, .webp, .pdf)
- PDF conversion: Converts PDF pages to images at configurable DPI using
convert_document - Image preprocessing: Optional rotation correction, contrast enhancement, and noise reduction
Layer 1 — OCR Engines
Each OCR engine runs independently on prepared images and returns structuredOCRResult objects.
Available engines:
EasyOCREngine— Deep learning-based, 80+ languagesRapidOCREngine— ONNX Runtime-based, lightweight and fastTesseractOCREngine— Google’s open-source engine, 100+ languagesDeepSeekOCREngine— VLLM-based batch processingDeepSeekOllamaOCREngine— Ollama-based local processingPaddleOCREngine,PPStructureV3Engine,PPChatOCRv4Engine,PaddleOCRVLEngine— PaddlePaddle family
Orchestrator — OCR Class
The top-level OCR class orchestrates the full pipeline:
- Receives a file path
- Delegates to Layer 0 for document preparation
- Passes prepared images to the configured Layer 1 engine
- Aggregates results, calculates confidence scores, and tracks metrics

