Metadata-Version: 2.4
Name: scrapurrr
Version: 0.4.0
Summary: Agentic Web Scraper
Author-email: "Klyne Chrysler C. Dotarot" <klyne@inventivlabs.io>
License-Expression: MIT
License-File: LICENSE
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: WWW/HTTP :: Indexing/Search
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: beautifulsoup4>=4.12
Requires-Dist: click>=8.0
Requires-Dist: cloakbrowser>=0.3
Requires-Dist: httpx>=0.27
Requires-Dist: litellm>=1.0
Requires-Dist: markdownify>=0.13
Requires-Dist: patchright>=1.50
Requires-Dist: pydantic>=2.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: rich>=13.0
Provides-Extra: dev
Requires-Dist: coverage>=7.0; extra == 'dev'
Requires-Dist: mypy>=1.10; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest-httpserver>=1.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Requires-Dist: types-beautifulsoup4>=4.12; extra == 'dev'
Requires-Dist: types-pyyaml>=6.0; extra == 'dev'
Description-Content-Type: text/markdown

<p align="center">
  <img src="public/COVER.png" alt="Scrapurrr" width="600" />
</p>

<p align="center">
  <strong>Agentic web scraping framework with a built-in chat CLI.</strong>
</p>

<p align="center">
  <img src="https://img.shields.io/badge/python-3.11%2B-blue" alt="Python" />
  <img src="https://img.shields.io/badge/license-MIT-green" alt="License" />
  <img src="https://img.shields.io/badge/version-0.2.0-orange" alt="Version" />
</p>

## What is Scrapurrr?

Scrapurrr is a Python framework for building agentic web scraping and automation apps, with a ready-to-use interactive CLI called `chatpurrr`.

**As a framework**, you define a Pydantic schema, point it at a URL, and get back typed data. It handles browser rendering, anti-detection, pagination, and LLM-powered extraction automatically.

**As a CLI tool**, you run `chatpurrr` and talk to it in natural language. Navigate pages, inspect elements, extract data, all from your terminal.

Built on PatchRight and CloakBrowser for undetected browsing. Supports 100+ LLM providers via LiteLLM.

## Install

```bash
pip install scrapurrr
```

## Get Started

**Use the chat CLI:**

```bash
chatpurrr
```

**Use the framework:**

```python
from scrapurrr import Scrapurrr
```

## Documentation

- [Chatpurrr](docs/CHATPURRR.md) - Interactive CLI usage, slash commands, setup
- [Framework](docs/FRAMEWORK.md) - Library API, extraction, agent mode, element inspection, configuration

## Usage Policy

Scrapurrr is intended for legitimate use cases: data collection from public sources, authorized testing, research, and personal automation. Users are responsible for complying with the terms of service of any website they interact with. Respect `robots.txt`, rate limits, and applicable laws.

## License

MIT. See [LICENSE](LICENSE).
