Skip to main content

What is RapidOCR?

Lightweight OCR based on ONNX Runtime for fast inference. Best for speed and lightweight deployment.

Usage

from upsonic.ocr import OCR
from upsonic.ocr.rapidocr import RapidOCR

# Create OCR with RapidOCR
ocr = OCR(RapidOCR, languages=['en', 'ch'], confidence_threshold=0.5)

# Extract text from image
text = ocr.get_text('invoice.png')
print(text)

# Process PDF
result = ocr.process_file('document.pdf')
print(f"Extracted {len(result.text)} characters from {result.page_count} pages")

Parameters

ParameterTypeDefaultDescription
languagesList[str]['en']List of language codes (primarily ‘en’ and ‘ch’)
confidence_thresholdfloat0.0Minimum confidence for text blocks
rotation_fixboolFalseAuto-detect and fix image rotation
enhance_contrastboolFalseEnhance image contrast
remove_noiseboolFalseApply noise reduction
pdf_dpiint300DPI for PDF rendering

Supported Languages

English, Chinese (simplified and traditional), Japanese, Korean, and several other scripts including Tamil, Telugu, Arabic, Cyrillic, and Devanagari.