> ## Documentation Index
> Fetch the complete documentation index at: https://docs.upsonic.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Tesseract

> Google's open-source OCR engine with 100+ language support

## What is Tesseract?

Google's open-source OCR engine with 100+ language support. Best for traditional OCR with extensive language coverage.

## Usage

```python theme={null}
from upsonic.ocr import OCR
from upsonic.ocr.layer_1.engines import TesseractOCREngine
# Also available: from upsonic.ocr import TesseractOCREngine

# Create engine instance
engine = TesseractOCREngine(languages=['eng'], enhance_contrast=True)

# Create OCR orchestrator
ocr = OCR(layer_1_ocr_engine=engine)

# Extract text
text = ocr.get_text('receipt.jpg')
print(text)

# Custom Tesseract configuration
engine_custom = TesseractOCREngine(languages=['eng'], psm=3, oem=3)
ocr_custom = OCR(layer_1_ocr_engine=engine_custom)
result = ocr_custom.process_file('document.pdf')
print(f"Text: {result.text}")
```

## Parameters

| Parameter              | Type       | Default   | Description                               |
| ---------------------- | ---------- | --------- | ----------------------------------------- |
| `languages`            | List\[str] | `['eng']` | List of Tesseract language codes          |
| `tesseract_cmd`        | str        | `None`    | Path to tesseract executable              |
| `confidence_threshold` | float      | `0.0`     | Minimum confidence for text blocks        |
| `rotation_fix`         | bool       | `False`   | Auto-detect and fix image rotation        |
| `enhance_contrast`     | bool       | `False`   | Enhance image contrast                    |
| `remove_noise`         | bool       | `False`   | Apply noise reduction                     |
| `preserve_formatting`  | bool       | `True`    | Preserve text layout and formatting       |
| `psm`                  | int        | `3`       | Page segmentation mode (0-13)             |
| `oem`                  | int        | `3`       | OCR Engine Mode (0-3)                     |
| `custom_config`        | str        | `''`      | Additional Tesseract configuration string |

## Supported Languages

100+ languages including all major languages. Requires language packs to be installed separately.

## Installation Note

Tesseract must be installed on the system:

* Ubuntu/Debian: `sudo apt-get install tesseract-ocr`
* macOS: `brew install tesseract`
* Windows: Download installer from GitHub
