Metadata-Version: 2.4
Name: langchain-zenrows-adaptive
Version: 0.2.1
Summary: A LangChain integration tool that provides reliable web scraping capabilities at any scale using ZenRows' Universal Scraper API, extended with Adaptive Stealth Mode
Author: LohicM
License-Expression: MIT
Project-URL: Homepage, https://github.com/LohicM/langchain-zenrows
Project-URL: Repository, https://github.com/LohicM/langchain-zenrows
Project-URL: Documentation, https://github.com/LohicM/langchain-zenrows
Keywords: langchain,zenrows,web scraping,adaptive stealth,llm,ai,ml,data extraction
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: langchain-core>=0.3.0
Requires-Dist: requests>=2.31.0
Requires-Dist: pydantic>=2.0
Provides-Extra: test
Requires-Dist: pytest>=7.0; extra == "test"
Requires-Dist: pytest-mock>=3.10.0; extra == "test"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "test"
Requires-Dist: pytest-cov>=4.0; extra == "test"
Requires-Dist: langchain-tests>=0.3.0; extra == "test"
Provides-Extra: dev
Requires-Dist: ruff>=0.1.5; extra == "dev"
Dynamic: license-file

# langchain-zenrows

> **Note:** This is a fork of the [original `langchain-zenrows` package](https://github.com/ZenRows-Hub/langchain-zenrows), extended to add support for **Adaptive Stealth Mode**.

The langchain-zenrows integration tool enables LangChain agents to scrape and access web content at any scale using ZenRows' enterprise-grade infrastructure.

Whether you need to scrape JavaScript-heavy single-page applications, bypass anti-bot systems, access geo-restricted content, or extract structured data at scale, this integration provides the tools and reliability needed for modern AI applications.

## Table of Contents

- [Installation](#installation)
- [Usage](#usage)
- [API Reference](#api-reference)
- [Features](#features)
- [License](#license)

## Installation

```console
pip install langchain-zenrows-adaptive
```

To use the ZenRows Universal Scraper with LangChain, you'll need a ZenRows API key. You can sign up for free at [ZenRows](https://app.zenrows.com/register?prod=universal_scraper).

> For more comprehensive examples and use cases, see the `examples/` folder.

### Basic Usage

```python
import os
from langchain_zenrows import ZenRowsUniversalScraper

# Set your ZenRows API key
os.environ["ZENROWS_API_KEY"] = "<YOUR_ZENROWS_API_KEY>"

# Initialize the tool
scraper = ZenRowsUniversalScraper()

# Scrape a simple webpage
result = scraper.invoke({"url": "https://httpbin.io/html"})
print(result)
```

### Advanced Usage with Parameters

```python
import os
from langchain_zenrows import ZenRowsUniversalScraper

# Set your ZenRows API key
os.environ["ZENROWS_API_KEY"] = "<YOUR_ZENROWS_API_KEY>"

scraper = ZenRowsUniversalScraper()

# Scrape with JavaScript rendering and premium proxies
result = scraper.invoke({
    "url": "https://www.scrapingcourse.com/ecommerce/",
    "js_render": True,
    "premium_proxy": True,
    "proxy_country": "us",
    "response_type": "markdown",
    "wait": 2000  # Wait 2 seconds after page load
})

print(result)
```

See the [API Reference](#api-reference) section below for more available parameters and customizing scraping requests.

### Adaptive Stealth Mode

Let ZenRows automatically pick the cheapest configuration that succeeds for each request — escalating to JavaScript rendering or Premium Proxies only when needed. You are billed only for what actually works.

```python
import os
from langchain_zenrows import ZenRowsUniversalScraper

os.environ["ZENROWS_API_KEY"] = "<YOUR_ZENROWS_API_KEY>"

scraper = ZenRowsUniversalScraper()

# ZenRows will automatically determine the best scraping strategy
result = scraper.invoke({
    "url": "https://www.scrapingcourse.com/ecommerce/",
    "mode": "auto",
    "response_type": "markdown",
})

print(result)
```

> **Note:** When `mode` is set to `"auto"`, the `js_render` and `premium_proxy` parameters are managed by ZenRows automatically and should not be set manually. You can still combine `mode="auto"` with `proxy_country`, `js_instructions`, and `custom_headers`.

### Using with LangChain Agents

```python
from langchain_zenrows import ZenRowsUniversalScraper
from langchain_openai import ChatOpenAI  # or your preferred LLM
from langgraph.prebuilt import create_react_agent
import os

# Set your ZenRows API key
os.environ["ZENROWS_API_KEY"] = "<YOUR_ZENROWS_API_KEY>"
os.environ["OPENAI_API_KEY"] = "<YOUR_OPEN_AI_API_KEY>"


# Initialize components
llm = ChatOpenAI(model="gpt-4o-mini")
zenrows_tool = ZenRowsUniversalScraper()

# Create agent
agent = create_react_agent(llm, [zenrows_tool])

# Use the agent
result = agent.invoke(
    {
        "messages": "Scrape https://news.ycombinator.com/ and list the top 3 stories with title, points, comments, username, and time."
    }
)

print("Agent Response:")
for message in result["messages"]:
    print(f"{message.content}")
```

### CSS Extraction

Extract specific data using CSS selectors:

```python
import json
import os
from langchain_zenrows import ZenRowsUniversalScraper

# Set your ZenRows API key
os.environ["ZENROWS_API_KEY"] = "<YOUR_ZENROWS_API_KEY>"

scraper = ZenRowsUniversalScraper()

# Extract specific elements
css_selector = json.dumps({
    "title": "h1",
    "paragraphs": "p"
})

result = scraper.invoke({
    "url": "https://httpbin.io/html",
    "css_extractor": css_selector
})
```

### Premium Proxy with Geo-targeting

Access geo-restricted content:

```python
import os
from langchain_zenrows import ZenRowsUniversalScraper

# Set your ZenRows API key
os.environ["ZENROWS_API_KEY"] = "<YOUR_ZENROWS_API_KEY>"

scraper = ZenRowsUniversalScraper()

# Check your IP location
result = scraper.invoke({
    "url": "https://httpbin.io/ip",
    "premium_proxy": True,
    "proxy_country": "us"
})
print(result)  # Shows the US IP being used
```

## API Reference

### ZenRowsUniversalScraper

Main tool class for web scraping with ZenRows.

**Parameters:**

- `zenrows_api_key` (str, optional): Your ZenRows API key. If not provided, looks for `ZENROWS_API_KEY` environment variable.

**Input Schema:**

For complete parameter documentation and details, see the [official ZenRows API Reference](https://docs.zenrows.com/universal-scraper-api/api-reference#parameter-overview).


| Parameter              | Type | Description                                                                                                                                                                            |
| ---------------------- | ---- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `url`                  | str  | **Required.** The URL to scrape                                                                                                                                                        |
| `mode`                 | str  | Set to `"auto"` to enable **Adaptive Stealth Mode**. ZenRows automatically selects the optimal configuration, escalating only when needed. Billed only for the configuration that succeeds. Compatible with `proxy_country`, `js_instructions`, and `custom_headers` |
| `js_render`            | bool | Enable JavaScript rendering with a headless browser. Essential for modern web apps, SPAs, and sites with dynamic content (default: False)                                              |
| `js_instructions`      | str  | Execute custom JavaScript on the page to interact with elements, scroll, click buttons, or manipulate content                                                                          |
| `premium_proxy`        | bool | Use residential IPs to bypass anti-bot protection. Essential for accessing protected sites (default: False)                                                                            |
| `proxy_country`        | str  | Set the country of the IP used for the request. Use for accessing geo-restricted content. Two-letter country code                                                                      |
| `session_id`           | int  | Maintain the same IP for multiple requests for up to 10 minutes. Essential for multi-step processes                                                                                    |
| `custom_headers`       | dict | Include custom headers in your request to mimic browser behavior                                                                                                                       |
| `wait_for`             | str  | Wait for a specific CSS Selector to appear in the DOM before returning content                                                                                                         |
| `wait`                 | int  | Wait a fixed amount of milliseconds after page load                                                                                                                                    |
| `block_resources`      | str  | Block specific resources (images, fonts, etc.) from loading to speed up scraping                                                                                                       |
| `response_type`        | str  | Convert HTML to other formats. Options: "markdown", "plaintext", "pdf"                                                                                                                 |
| `css_extractor`        | str  | Extract specific elements using CSS selectors (JSON format)                                                                                                                            |
| `autoparse`            | bool | Automatically extract structured data from HTML (default: False)                                                                                                                       |
| `screenshot`           | str  | Capture an above-the-fold screenshot of the page (default: "false")                                                                                                                    |
| `screenshot_fullpage`  | str  | Capture a full-page screenshot (default: "false")                                                                                                                                      |
| `screenshot_selector`  | str  | Capture a screenshot of a specific element using CSS Selector                                                                                                                          |
| `screenshot_format`    | str  | Choose between "png" (default) and "jpeg" formats for screenshots                                                                                                                      |
| `screenshot_quality`   | int  | For JPEG format, set quality from 1 to 100. Lower values reduce file size but decrease quality                                                                                         |
| `original_status`      | bool | Return the original HTTP status code from the target page (default: False)                                                                                                             |
| `allowed_status_codes` | str  | Returns the content even if the target page fails with specified status codes. Useful for debugging or when you need content from error pages                                          |
| `json_response`        | bool | Capture network requests in JSON format, including XHR or Fetch data. Ideal for intercepting API calls made by the web page (default: False)                                           |
| `outputs`              | str  | Specify which data types to extract from the scraped HTML. Accepted values: emails, phone_numbers, headings, images, audios, videos, links, menus, hashtags, metadata, tables, favicon |


## Features

- **Adaptive Stealth Mode**: Automatically selects the optimal scraping configuration, billing only for what succeeds
- **JavaScript Rendering**: Scrape modern SPAs and dynamic content
- **Anti-Bot Bypass**: Bypass sophisticated bot detection systems
- **Geo-Targeting**: Access region-specific content with 190+ countries
- **Multiple Output Formats**: HTML, Markdown, Plaintext, PDF, Screenshots
- **CSS Extraction**: Target specific data with CSS selectors
- **Structured Data Extraction**: Automatically extract emails, phone numbers, links, and other data types
- **Session Management**: Maintain consistent sessions across requests
- **Wait Conditions**: Smart waiting for dynamic content
- **Premium Proxies**: 55M+ residential IPs for maximum success rates

## License

`langchain-zenrows` is distributed under the terms of the [MIT](https://spdx.org/licenses/MIT.html) license.

## Support

- [ZenRows Documentation](https://docs.zenrows.com/)
- [LangChain Documentation](https://python.langchain.com/)

