What is PaddleOCR?
Comprehensive OCR with multiple specialized pipelines for advanced document understanding.Usage
PaddleOCR (General OCR)
| Parameter | Type | Default | Description |
|---|---|---|---|
lang | str | 'en' | Language code |
ocr_version | str | 'PP-OCRv5' | OCR version (‘PP-OCRv3’, ‘PP-OCRv4’, ‘PP-OCRv5’) |
use_doc_orientation_classify | bool | None | Enable document orientation classification |
use_doc_unwarping | bool | None | Enable document unwarping |
use_textline_orientation | bool | None | Enable text line orientation detection |
text_det_limit_side_len | int | None | Limit on detection input side length |
text_rec_score_thresh | float | None | Text recognition score threshold |
return_word_box | bool | None | Return word-level bounding boxes |
PPStructureV3 (Document Structure)
| Parameter | Type | Default | Description |
|---|---|---|---|
use_table_recognition | bool | None | Enable table recognition |
use_formula_recognition | bool | None | Enable formula recognition |
use_seal_recognition | bool | None | Enable seal text recognition |
use_chart_recognition | bool | None | Enable chart recognition |
layout_threshold | float | None | Layout detection score threshold |
lang | str | 'en' | Language code |
PPChatOCRv4 (Chat-based OCR)
| Parameter | Type | Default | Description |
|---|---|---|---|
use_table_recognition | bool | None | Enable table recognition |
use_seal_recognition | bool | None | Enable seal recognition |
mllm_chat_bot_config | dict | None | Multimodal LLM configuration |
retriever_config | dict | None | Retriever configuration for vector search |
PaddleOCRVL (Vision-Language)
| Parameter | Type | Default | Description |
|---|---|---|---|
use_layout_detection | bool | None | Enable layout detection |
use_chart_recognition | bool | None | Enable chart recognition |
format_block_content | bool | None | Format content as Markdown |
vl_rec_backend | str | 'local' | VL recognition backend |
temperature | float | None | Sampling temperature for VLM |

