What is Tesseract?
Google’s open-source OCR engine with 100+ language support. Best for traditional OCR with extensive language coverage.Usage
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
languages | List[str] | ['eng'] | List of Tesseract language codes |
tesseract_cmd | str | None | Path to tesseract executable |
confidence_threshold | float | 0.0 | Minimum confidence for text blocks |
rotation_fix | bool | False | Auto-detect and fix image rotation |
enhance_contrast | bool | False | Enhance image contrast |
remove_noise | bool | False | Apply noise reduction |
preserve_formatting | bool | True | Preserve text layout and formatting |
psm | int | 3 | Page segmentation mode (0-13) |
oem | int | 3 | OCR Engine Mode (0-3) |
custom_config | str | '' | Additional Tesseract configuration string |
Supported Languages
100+ languages including all major languages. Requires language packs to be installed separately.Installation Note
Tesseract must be installed on the system:- Ubuntu/Debian:
sudo apt-get install tesseract-ocr - macOS:
brew install tesseract - Windows: Download installer from GitHub

