Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.upsonic.ai/llms.txt

Use this file to discover all available pages before exploring further.

Exception Hierarchy

All OCR exceptions inherit from OCRError, allowing you to catch all OCR errors at once or handle specific exceptions individually.
OCRError (base)
├── OCRProviderError          # Engine/provider level errors
├── OCRFileNotFoundError      # File not found or not a file
├── OCRUnsupportedFormatError # Unsupported file format
├── OCRProcessingError        # Error during OCR processing
└── OCRTimeoutError           # Layer 1 timeout exceeded

Base: OCRError

Parent of all OCR exceptions. Carries 3 attributes:
AttributeTypeDescription
messagestrError message
error_codestr | NoneMachine-readable error code (e.g. "LAYER1_TIMEOUT")
original_errorException | NoneWrapped original exception (if any)
str() output format: [ERROR_CODE] message (Original: original error)
from upsonic.ocr import OCRError

try:
    text = ocr.get_text("document.pdf")
except OCRError as e:
    print(e.message)        # "Layer 1 OCR timed out after 30.0s on page 2"
    print(e.error_code)     # "LAYER1_TIMEOUT"
    print(e.original_error) # None or original exception

OCRProviderError

Raised during engine initialization and dependency errors.
error_codeWhen
UNSUPPORTED_LANGUAGELanguage not supported by the engine
READER_INIT_FAILEDEasyOCR reader creation failed
ENGINE_INIT_FAILEDRapidOCR engine initialization failed
TESSERACT_NOT_INSTALLEDTesseract not installed on the system
VLLM_NOT_AVAILABLEvLLM package not installed (DeepSeek)
UNSUPPORTED_MODEL_ARCHITECTUREDeepSeek model architecture not supported by vLLM
MODEL_INIT_FAILEDDeepSeek model loading failed
CLIENT_INIT_FAILEDOllama client connection failed
OLLAMA_NOT_AVAILABLEollama package not installed
from upsonic.ocr import OCRProviderError
from upsonic.ocr.layer_1.engines import EasyOCREngine

try:
    engine = EasyOCREngine(languages=['xyz'])
except OCRProviderError as e:
    # e.error_code == "UNSUPPORTED_LANGUAGE"
    print(e.message)

OCRFileNotFoundError

Raised when the file does not exist or the path is not a file. Thrown by Layer 0 (document_converter).
error_codeWhen
FILE_NOT_FOUNDFile does not exist
NOT_A_FILEPath points to a directory
from upsonic.ocr import OCRFileNotFoundError

try:
    text = ocr.get_text("nonexistent_file.pdf")
except OCRFileNotFoundError as e:
    # e.error_code == "FILE_NOT_FOUND"
    print(e.message)

OCRUnsupportedFormatError

Raised when an unsupported file format is provided. Thrown by Layer 0. Supported formats: .png, .jpg, .jpeg, .bmp, .tiff, .tif, .gif, .webp, .pdf
error_codeWhen
UNSUPPORTED_FORMATFile has an unsupported extension
from upsonic.ocr import OCRUnsupportedFormatError

try:
    text = ocr.get_text("document.docx")
except OCRUnsupportedFormatError as e:
    # e.error_code == "UNSUPPORTED_FORMAT"
    print(e.message)

OCRProcessingError

Raised when an error occurs at the engine level during OCR processing. Each engine uses its own error code.
error_codeEngineWhen
EASYOCR_PROCESSING_FAILEDEasyOCRreadtext call failed
RAPIDOCR_PROCESSING_FAILEDRapidOCROCR call failed
TESSERACT_PROCESSING_FAILEDTesseractimage_to_data call failed
DEEPSEEK_PROCESSING_FAILEDDeepSeekvLLM generate failed
DEEPSEEK_BATCH_PROCESSING_FAILEDDeepSeekBatch processing failed
DEEPSEEK_OLLAMA_PROCESSING_FAILEDDeepSeek OllamaOllama streaming failed
PADDLE_PROCESSING_FAILEDPaddleOCRpredict call failed
PDF_CONVERSION_FAILEDLayer 0PDF to image conversion failed
IMAGE_LOAD_FAILEDLayer 0Image could not be loaded
MISSING_DEPENDENCYLayer 0PyMuPDF (fitz) not installed
from upsonic.ocr import OCRProcessingError

try:
    text = ocr.get_text("corrupted_image.png")
except OCRProcessingError as e:
    print(e.error_code)     # "EASYOCR_PROCESSING_FAILED"
    print(e.original_error) # Original exception

OCRTimeoutError

Raised when layer_1_timeout is exceeded in the OCR orchestrator. Applied per page — if page 3 of a 5-page PDF times out, only that page raises the error.
error_codeWhen
LAYER1_TIMEOUTlayer_1_timeout seconds exceeded
from upsonic.ocr import OCR, OCRTimeoutError
from upsonic.ocr.layer_1.engines import EasyOCREngine

engine = EasyOCREngine(languages=['en'])
ocr = OCR(layer_1_ocr_engine=engine, layer_1_timeout=30.0)

try:
    text = ocr.get_text("large_file.pdf")
except OCRTimeoutError as e:
    # e.error_code == "LAYER1_TIMEOUT"
    # e.message == "Layer 1 OCR timed out after 30.0s on page 3"
    print(e.message)

Import

# All from one place
from upsonic.ocr import (
    OCRError,
    OCRProviderError,
    OCRFileNotFoundError,
    OCRUnsupportedFormatError,
    OCRProcessingError,
    OCRTimeoutError,
)

# Or directly from exceptions module
from upsonic.ocr.exceptions import OCRTimeoutError

Catch Pattern

Handle exceptions from most specific to most general:
try:
    text = ocr.get_text("document.pdf")
except OCRTimeoutError:
    print("Timeout - increase timeout or try a smaller file")
except OCRFileNotFoundError:
    print("File not found")
except OCRUnsupportedFormatError:
    print("This format is not supported")
except OCRProviderError:
    print("Engine issue - missing dependency or unsupported language")
except OCRProcessingError:
    print("OCR processing error")
except OCRError:
    print("Unknown OCR error")