> ## Documentation Index
> Fetch the complete documentation index at: https://docs.upsonic.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# PdfPlumber Loader

> Load PDF documents using pdfplumber for superior table extraction

## Overview

PdfPlumber loader excels at extracting structured content from PDFs, especially tables and complex layouts. It provides superior table detection and preserves document structure better than standard PDF loaders.

**Loader Class:** `PdfPlumberLoader`

**Config Class:** `PdfPlumberLoaderConfig`

## Install

<Note>
  Install the PdfPlumber loader optional dependency group:

  ```bash theme={null}
  uv pip install "upsonic[pdfplumber-loader]"
  ```
</Note>

## Examples

```python theme={null}
from upsonic import Agent, Task, KnowledgeBase
from upsonic.loaders.pdfplumber import PdfPlumberLoader
from upsonic.loaders.config import PdfPlumberLoaderConfig
from upsonic.embeddings import OpenAIEmbedding, OpenAIEmbeddingConfig
from upsonic.text_splitter.recursive import RecursiveChunker, RecursiveChunkingConfig
from upsonic.vectordb import ChromaProvider, ChromaConfig, ConnectionConfig, Mode

# Configure loader with table extraction
loader_config = PdfPlumberLoaderConfig(
    extraction_mode="hybrid",
    extract_tables=True,
    table_format="markdown"
)
loader = PdfPlumberLoader(loader_config)

# Setup KnowledgeBase
embedding = OpenAIEmbedding(OpenAIEmbeddingConfig())
chunker = RecursiveChunker(RecursiveChunkingConfig())
vectordb = ChromaProvider(ChromaConfig(
    collection_name="pdf_tables",
    vector_size=1536,
    connection=ConnectionConfig(mode=Mode.IN_MEMORY)
))

kb = KnowledgeBase(
    sources=["report.pdf"],
    embedding_provider=embedding,
    vectordb=vectordb,
    loaders=[loader],
    splitters=[chunker]
)

# Query with Agent
agent = Agent("anthropic/claude-sonnet-4-5")
task = Task("Extract all table data", context=[kb])
result = agent.do(task)
print(result)
```

## Parameters

| Parameter                  | Type                                        | Description                              | Default      | Source   |
| -------------------------- | ------------------------------------------- | ---------------------------------------- | ------------ | -------- |
| `encoding`                 | `str \| None`                               | File encoding (auto-detected if None)    | None         | Base     |
| `error_handling`           | `"ignore" \| "warn" \| "raise"`             | How to handle loading errors             | "warn"       | Base     |
| `include_metadata`         | `bool`                                      | Whether to include file metadata         | True         | Base     |
| `custom_metadata`          | `dict`                                      | Additional metadata to include           | {}           | Base     |
| `max_file_size`            | `int \| None`                               | Maximum file size in bytes               | None         | Base     |
| `skip_empty_content`       | `bool`                                      | Skip documents with empty content        | True         | Base     |
| `extraction_mode`          | `"hybrid" \| "text_only" \| "ocr_only"`     | Content extraction strategy              | "hybrid"     | Specific |
| `start_page`               | `int \| None`                               | First page to process (1-indexed)        | None         | Specific |
| `end_page`                 | `int \| None`                               | Last page to process (inclusive)         | None         | Specific |
| `clean_page_numbers`       | `bool`                                      | Remove page numbers from headers/footers | True         | Specific |
| `page_num_start_format`    | `str \| None`                               | Format string for page start markers     | None         | Specific |
| `page_num_end_format`      | `str \| None`                               | Format string for page end markers       | None         | Specific |
| `extra_whitespace_removal` | `bool`                                      | Normalize whitespace                     | True         | Specific |
| `pdf_password`             | `str \| None`                               | Password for encrypted PDFs              | None         | Specific |
| `extract_tables`           | `bool`                                      | Extract and include tables               | True         | Specific |
| `table_format`             | `"text" \| "markdown" \| "csv" \| "grid"`   | Format for extracted tables              | "markdown"   | Specific |
| `table_settings`           | `dict`                                      | Advanced table detection settings        | Default dict | Specific |
| `extract_images`           | `bool`                                      | Extract image information                | False        | Specific |
| `layout_mode`              | `"default" \| "layout" \| "simple"`         | Text extraction layout mode              | "layout"     | Specific |
| `use_text_flow`            | `bool`                                      | Use text flow analysis                   | True         | Specific |
| `char_margin`              | `float`                                     | Minimum distance between characters      | 3.0          | Specific |
| `line_margin`              | `float`                                     | Minimum distance between lines           | 0.5          | Specific |
| `word_margin`              | `float`                                     | Minimum distance between words           | 0.1          | Specific |
| `extract_page_dimensions`  | `bool`                                      | Include page dimensions in metadata      | False        | Specific |
| `crop_box`                 | `tuple[float, float, float, float] \| None` | Crop box (x0, y0, x1, y1)                | None         | Specific |
| `extract_annotations`      | `bool`                                      | Extract annotations and hyperlinks       | False        | Specific |
| `keep_blank_chars`         | `bool`                                      | Preserve blank characters                | False        | Specific |
