Documentation Index
Fetch the complete documentation index at: https://docs.upsonic.ai/llms.txt
Use this file to discover all available pages before exploring further.
This example shows how to build a product extraction agent using Upsonicβs Agent with the built-in FirecrawlTools. Point it at any shopping website and it scrapes the page, pulls out product names, prices, and short descriptions, then returns the results as a clean, structured table.
The example targets books.toscrape.com β a publicly available, scraping-safe demo bookstore β but the same pattern works for any publicly accessible e-commerce site.
Overview
The agent has three components:
- Agent β LLM-driven agent that orchestrates scraping and data extraction
- FirecrawlTools β Built-in Upsonic toolkit wrapping the Firecrawl API; only
scrape_url is enabled to keep the tool surface minimal
- Task β Defines the target URL and the exact output format
Project Structure
firecrawl_shopping_scraper/
βββ main.py # Entry point: Agent + FirecrawlTools + Task
βββ requirements.txt # Python dependencies
βββ .env # API keys (never commit this file)
Environment Variables
Get your free Firecrawl API key β Sign up at firecrawl.dev, navigate to your dashboard, and copy your key. No credit card required to get started.
# Required: Firecrawl API key β https://firecrawl.dev
FIRECRAWL_API_KEY=fc-your-key-here
# Required: LLM provider key (example uses Anthropic Claude)
ANTHROPIC_API_KEY=your-anthropic-key-here
Installation
# With uv (recommended)
uv venv && source .venv/bin/activate
uv pip install upsonic "firecrawl-py" python-dotenv
# With pip
python3 -m venv .venv && source .venv/bin/activate
pip install upsonic firecrawl-py python-dotenv
Complete Implementation
main.py
import os
from dotenv import load_dotenv
from upsonic import Agent, Task
from upsonic.tools.custom_tools.firecrawl import FirecrawlTools
load_dotenv()
# ββ 1. Configure FirecrawlTools β only scrape_url is needed βββββββ
firecrawl = FirecrawlTools(
enable_scrape=True,
enable_crawl=False,
enable_map=False,
enable_search=False,
enable_batch_scrape=False,
enable_extract=False,
enable_crawl_management=False,
enable_batch_management=False,
enable_extract_management=False,
)
# ββ 2. Define the extraction task βββββββββββββββββββββββββββββββββ
task = Task(
description="""
Scrape the homepage of http://books.toscrape.com and extract ALL
products visible on the page.
For each product return:
- Name (full book title)
- Price (as shown, e.g. 'Β£51.77')
- Rating (word form, e.g. 'Three')
Format the output as a Markdown table:
| # | Book Title | Price | Rating |
|---|-----------|-------|--------|
Sort by price descending. Add a one-line summary at the top
with the total number of products found and the price range.
"""
)
# ββ 3. Create the agent βββββββββββββββββββββββββββββββββββββββββββ
agent = Agent(
model="anthropic/claude-sonnet-4-6",
tools=[firecrawl],
)
# ββ 4. Run ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
result = agent.do(task)
print(result)
requirements.txt
upsonic
firecrawl-py
python-dotenv
anthropic
How It Works
| Step | What Happens |
|---|
| 1 | agent.do(task) sends the task prompt to the LLM |
| 2 | The LLM calls scrape_url("http://books.toscrape.com") via FirecrawlTools |
| 3 | Firecrawl fetches the page and returns clean Markdown |
| 4 | The LLM parses the Markdown, identifies every product block, and extracts name, price, and rating |
| 5 | Results are formatted as a sorted Markdown table and returned |
Sample Output
Found 20 products Β· Price range: Β£10.00 β Β£59.69
| # | Book Title | Price | Rating |
|----|----------------------------------------------|--------|--------|
| 1 | Libertarianism for Beginners | Β£59.69 | Two |
| 2 | It's Only the Himalayas | Β£52.29 | Two |
| 3 | The Black Maria | Β£52.15 | One |
| 4 | Starving Hearts (Triangular Trade Trilogyβ¦) | Β£13.99 | Two |
| 5 | ... | ... | ... |
Extending the Example
Crawl multiple pages
Switch from scrape_url to crawl_website to follow pagination automatically:
firecrawl = FirecrawlTools(
enable_scrape=False,
enable_crawl=True,
enable_crawl_management=True,
)
task = Task(
description="""
Crawl http://books.toscrape.com (up to 5 pages) and extract every
product: name, price, and rating. Return a single Markdown table
sorted by price descending.
"""
)
Use extract_data for schema-driven, LLM-powered extraction directly inside Firecrawl:
firecrawl = FirecrawlTools(
enable_scrape=False,
enable_extract=True,
)
task = Task(
description="""
Use extract_data on http://books.toscrape.com/* with this JSON schema:
{
"products": [
{"name": "string", "price": "string", "rating": "string"}
]
}
Return the raw structured result.
"""
)
Point at a different shop
Replace the URL in the task description with any publicly accessible store:
task = Task(
description="""
Scrape https://your-target-shop.com and extract all visible products.
For each product return name, price, and a short description (1-2 sentences).
Format as a Markdown table sorted by price descending.
"""
)
Key Features
| Feature | Detail |
|---|
| Minimal tool surface | Only scrape_url is enabled β the agent cannot accidentally crawl, search, or batch-scrape |
| Clean Markdown input | Firecrawl strips boilerplate and returns structured Markdown, making product parsing straightforward |
| Model-agnostic | Swap anthropic/claude-sonnet-4-6 for any Upsonic-supported provider |
| Extensible | Switch to crawl_website or extract_data for multi-page or schema-driven extraction |
Security Notes
- The agent only has access to
scrape_url β it cannot read local files, execute code, or access other systems.
- Only point the agent at publicly accessible URLs. Firecrawl respects
robots.txt by default.
- Store
FIRECRAWL_API_KEY and ANTHROPIC_API_KEY in .env β never hardcode keys in source files.
Repository
View the full example: Firecrawl Shopping Scraper