Metadata-Version: 2.4
Name: llama-index-tools-scraperapi
Version: 0.1.0
Summary: llama-index tools to use ScraperAPI web scraping
Project-URL: Homepage, https://www.scraperapi.com
Project-URL: Documentation, https://docs.scraperapi.com
Project-URL: Repository, https://github.com/scraperapi/llama-index-tools-scraperapi
Project-URL: Bug Tracker, https://github.com/scraperapi/llama-index-tools-scraperapi/issues
Author: ScraperAPI
License-Expression: MIT
License-File: LICENSE
Keywords: ai,llama-index,scraperapi,scraping,tools,web-scraping
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: <4.0,>=3.10
Requires-Dist: llama-index-core<0.15,>=0.13.0
Requires-Dist: requests<3,>=2.28.0
Description-Content-Type: text/markdown

# ScraperAPI LlamaIndex Tools Integration

This tool connects to [ScraperAPI](https://www.scraperapi.com), a web scraping API that handles proxies, browsers, and CAPTCHAs, enabling your LlamaIndex agent to scrape web pages and extract structured data from Amazon, Google, eBay, Walmart, and Redfin.

## Installation

```bash
pip install llama-index-tools-scraperapi
```

## Usage

```python
import asyncio
import os
from llama_index.tools.scraperapi import ScraperAPIToolSpec
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.openai import OpenAI

async def main():
    scraper_tool = ScraperAPIToolSpec(
        api_key=os.environ["SCRAPERAPI_API_KEY"],
    )
    agent = FunctionAgent(
        tools=scraper_tool.to_tool_list(),
        llm=OpenAI(model="gpt-4.1"),
    )

    response = await agent.run(
        "Scrape https://example.com and summarize the content"
    )
    print(response)

asyncio.run(main())
```

### Scrape a Web Page

```python
from llama_index.tools.scraperapi import ScraperAPIToolSpec

tool = ScraperAPIToolSpec(api_key=os.environ["SCRAPERAPI_API_KEY"])

# Returns markdown content by default
content = tool.scrape("https://example.com")
print(content)

# Get plain text instead
content = tool.scrape("https://example.com", output_format="text")

# Enable JS rendering for dynamic pages
content = tool.scrape("https://example.com", render=True)
```

### Amazon

```python
# Product details by ASIN
product = tool.amazon_product(asin="B07FTKQ97Q")

# Search products
results = tool.amazon_search(query="wireless headphones")

# All seller offers for a product
offers = tool.amazon_offers(asin="B07FTKQ97Q")
```

### Google

```python
# Web search (structured SERP)
results = tool.google_search(query="Python web scraping tutorial")

# Shopping results
shopping = tool.google_shopping(query="laptop")

# News articles
news = tool.google_news(query="AI", tbs="w")  # past week

# Maps / places search
places = tool.google_maps_search(query="pizza", latitude=40.7128, longitude=-74.0060)

# Job listings
jobs = tool.google_jobs(query="python developer", gl="us")
```

### eBay

```python
# Product details
product = tool.ebay_product(product_id="166619046796")

# Search with filters
results = tool.ebay_search(query="vintage watch", sort_by="price_lowest", condition="used")
```

### Walmart

```python
# Product details
product = tool.walmart_product(product_id="5253396052")

# Search
results = tool.walmart_search(query="laptop")

# Browse category
items = tool.walmart_category(category="3944_1089430_37807")

# Product reviews
reviews = tool.walmart_reviews(product_id="5253396052", sort="helpful")
```

### Redfin

```python
# Search listings
listings = tool.redfin_search(url="https://www.redfin.com/city/30749/CA/San-Francisco")

# Agent details
agent = tool.redfin_agent(url="https://www.redfin.com/real-estate-agents/agent-name")

# For-sale listing
listing = tool.redfin_forsale(url="https://www.redfin.com/CA/San-Francisco/123-Main-St")

# For-rent listing
rental = tool.redfin_forrent(url="https://www.redfin.com/CA/San-Francisco/456-Oak-Ave")
```

### Geo-targeted Scraping

```python
tool = ScraperAPIToolSpec(
    api_key=os.environ["SCRAPERAPI_API_KEY"],
    country_code="uk",
)

# All requests will use UK proxies by default
content = tool.scrape("https://example.co.uk")

# Override per request
content = tool.scrape("https://example.de", country_code="de")
```

## Available Tools

**Scraping:**
- `scrape`: Scrape any web page and return content as markdown, text, or JSON.

**Amazon (Structured Data):**
- `amazon_product`: Get product details by ASIN.
- `amazon_search`: Search Amazon products.
- `amazon_offers`: Get all seller offers for a product.

**Google (Structured Data):**
- `google_search`: Google SERP search results.
- `google_shopping`: Google Shopping product results.
- `google_news`: Google News articles.
- `google_maps_search`: Google Maps places search.
- `google_jobs`: Google Jobs listings.

**eBay (Structured Data):**
- `ebay_product`: Get product details by product ID.
- `ebay_search`: Search eBay listings.

**Redfin (Structured Data):**
- `redfin_search`: Search Redfin listings.
- `redfin_agent`: Get agent profile details.
- `redfin_forsale`: Get for-sale listing details.
- `redfin_forrent`: Get for-rent listing details.

**Walmart (Structured Data):**
- `walmart_product`: Get product details by product ID.
- `walmart_search`: Search Walmart products.
- `walmart_category`: Browse a Walmart category.
- `walmart_reviews`: Get product reviews.

## Error Handling

All API errors raise `ScraperAPIError`, so you can handle them specifically:

```python
from llama_index.tools.scraperapi import ScraperAPIToolSpec, ScraperAPIError

tool = ScraperAPIToolSpec(api_key=os.environ["SCRAPERAPI_API_KEY"])

try:
    result = tool.scrape("https://example.com")
except ScraperAPIError as e:
    print(f"Scraping failed: {e}")
```

## Configuration

| Parameter | Type | Default | Description |
|---|---|---|---|
| `api_key` | `str` | required | ScraperAPI key |
| `render` | `bool` | `False` | Enable JS rendering by default |
| `country_code` | `str` | `None` | Default geo-targeting country code |
| `device_type` | `str` | `None` | `"desktop"` or `"mobile"` |
| `timeout` | `int` | `70` | Request timeout in seconds |
