Overview
The agent has three components:- Agent — LLM-driven agent that orchestrates scraping and data extraction
- FirecrawlTools — Built-in Upsonic toolkit wrapping the Firecrawl API; only
scrape_urlis enabled to keep the tool surface minimal - Task — Defines the target URL and the exact output format
Project Structure
Environment Variables
Installation
Complete Implementation
main.py
requirements.txt
How It Works
| Step | What Happens |
|---|---|
| 1 | agent.do(task) sends the task prompt to the LLM |
| 2 | The LLM calls scrape_url("http://books.toscrape.com") via FirecrawlTools |
| 3 | Firecrawl fetches the page and returns clean Markdown |
| 4 | The LLM parses the Markdown, identifies every product block, and extracts name, price, and rating |
| 5 | Results are formatted as a sorted Markdown table and returned |
Sample Output
Extending the Example
Crawl multiple pages
Switch fromscrape_url to crawl_website to follow pagination automatically:
Structured JSON extraction
Useextract_data for schema-driven, LLM-powered extraction directly inside Firecrawl:
Point at a different shop
Replace the URL in the task description with any publicly accessible store:Key Features
| Feature | Detail |
|---|---|
| Minimal tool surface | Only scrape_url is enabled — the agent cannot accidentally crawl, search, or batch-scrape |
| Clean Markdown input | Firecrawl strips boilerplate and returns structured Markdown, making product parsing straightforward |
| Model-agnostic | Swap anthropic/claude-sonnet-4-6 for any Upsonic-supported provider |
| Extensible | Switch to crawl_website or extract_data for multi-page or schema-driven extraction |
Security Notes
- The agent only has access to
scrape_url— it cannot read local files, execute code, or access other systems. - Only point the agent at publicly accessible URLs. Firecrawl respects
robots.txtby default. - Store
FIRECRAWL_API_KEYandANTHROPIC_API_KEYin.env— never hardcode keys in source files.

