Metadata-Version: 2.4
Name: langchain-mrscraper
Version: 0.2.3
Summary: LangChain tools for the MrScraper web-scraping API
Project-URL: Homepage, https://mrscraper.com
Project-URL: Documentation, https://docs.mrscraper.com
Project-URL: Repository, https://github.com/mrscraper/langchain-mrscraper
Author: Riandra Diva Auzan, R&D Team MrScraper
License: MIT
License-File: LICENSE
Keywords: langchain,llm,mrscraper,scraping,tools,web scraping
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Typing :: Typed
Requires-Python: >=3.9
Requires-Dist: httpx>=0.27
Requires-Dist: langchain-core>=0.3.0
Requires-Dist: mrscraper-sdk>=0.1.2
Provides-Extra: dev
Requires-Dist: langchain-tests>=0.3.0; extra == 'dev'
Requires-Dist: mypy>=1.10; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.24; extra == 'dev'
Requires-Dist: pytest>=8; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Provides-Extra: test
Requires-Dist: langchain-tests>=0.3.0; extra == 'test'
Requires-Dist: pytest-asyncio>=0.24; extra == 'test'
Requires-Dist: pytest>=8; extra == 'test'
Description-Content-Type: text/markdown

# langchain-mrscraper

LangChain integration package for the [MrScraper SDK](https://pypi.org/project/mrscraper-sdk/).

This package exposes MrScraper capabilities as LangChain tools so agents can:

- Fetch rendered HTML from protected websites
- Scrape Google SERP (search results) synchronously
- Create AI scrapers from natural-language prompts
- Rerun AI/manual scrapers (single and bulk)
- List and fetch scraping results

## Installation

```bash
pip install -U langchain-mrscraper
```

or:

```bash
uv add langchain-mrscraper
```

`mrscraper-sdk` is installed automatically as a dependency, so users do not need to install it separately.

## Quick start

```python
import os
from langchain_mrscraper import MrScraperToolkit

os.environ["MRSCRAPER_API_TOKEN"] = "your-token"

tools = MrScraperToolkit().get_tools()
```

## Use with an agent

```python
from langgraph.prebuilt import create_react_agent
from langchain_openai import ChatOpenAI
from langchain_mrscraper import MrScraperToolkit

tools = MrScraperToolkit(token="your-token").get_tools()
agent = create_react_agent(ChatOpenAI(model="gpt-4o-mini"), tools)
```

## Available tools

- `mrscraper_google_serp`
- `mrscraper_fetch_html`
- `mrscraper_create_scraper`
- `mrscraper_rerun_ai_scraper`
- `mrscraper_bulk_rerun_ai_scraper`
- `mrscraper_rerun_manual_scraper`
- `mrscraper_bulk_rerun_manual_scraper`
- `mrscraper_get_all_results`
- `mrscraper_get_result_by_id`

## API styles

You can initialize via:

- `MrScraperToolkit(...).get_tools()` (recommended)
- `load_mrscraper_tools(...)` convenience function
- per-tool constructors with `token="..."`
- environment variables `MRSCRAPER_API_TOKEN`

## Tools vs. loaders

This integration is intentionally tools-first. MrScraper endpoints are action-oriented
(fetch, create, rerun, list, retrieve) and best represented as `BaseTool` methods that
agents can call explicitly.

A document loader abstraction is usually better when the primary job is deterministic
"URL -> documents" ingestion into vector stores. MrScraper can support that in a
separate package later, but this package should remain focused on agent tools.

## Compliance & Legal Risk
> **WARNING**
> Scraping login-protected pages carries serious legal and compliance risks. Many websites explicitly prohibit automated access in their Terms of Service, and bypassing authentication to scrape content may expose you to legal action including lawsuits, account termination, and financial penalties. By proceeding on scraping login-protected pages, you confirm that you have read and understood the target website's Terms of Service, and you fully accept all legal, financial, and ethical responsibility for your actions.