Metadata-Version: 2.4
Name: crawlix
Version: 0.2.0
Summary: One API. Any backend. Full browser automation to lightweight scraping.
Project-URL: Homepage, https://github.com/keyreyla/crawlix
Project-URL: Documentation, https://keyreyla.github.io/crawlix
Project-URL: Source, https://github.com/keyreyla/crawlix
Project-URL: Issues, https://github.com/keyreyla/crawlix/issues
Project-URL: PyPI, https://pypi.org/project/crawlix
Author: keylordelrey
License: MIT
License-File: LICENSE
Keywords: browser-automation,playwright,selenium,web-scraping
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: beautifulsoup4>=4.12
Requires-Dist: requests>=2.28
Provides-Extra: async
Requires-Dist: httpx>=0.24; extra == 'async'
Provides-Extra: dev
Requires-Dist: hatchling>=1.21; extra == 'dev'
Requires-Dist: mypy>=1.8; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: responses>=0.25; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Provides-Extra: full
Requires-Dist: httpx>=0.24; extra == 'full'
Requires-Dist: playwright>=1.40; extra == 'full'
Requires-Dist: selenium>=4.15; extra == 'full'
Provides-Extra: mobile
Requires-Dist: httpx>=0.24; extra == 'mobile'
Requires-Dist: selenium>=4.15; extra == 'mobile'
Provides-Extra: playwright
Requires-Dist: playwright>=1.40; extra == 'playwright'
Provides-Extra: selenium
Requires-Dist: selenium>=4.15; extra == 'selenium'
Provides-Extra: termux
Requires-Dist: httpx>=0.24; extra == 'termux'
Requires-Dist: selenium>=4.15; extra == 'termux'
Description-Content-Type: text/markdown

# crawlix

> One API. Any backend. Full browser automation to lightweight scraping.

[![PyPI](https://img.shields.io/pypi/v/crawlix)](https://pypi.org/project/crawlix/)
[![Python](https://img.shields.io/pypi/pyversions/crawlix)](https://pypi.org/project/crawlix/)
[![License](https://img.shields.io/pypi/l/crawlix)](LICENSE)
[![Docs](https://img.shields.io/badge/docs-github--pages-purple)](https://keyreyla.github.io/crawlix)

**crawlix** is a Python browser automation and web scraping library with a unified API across multiple backends. Write your code once — switch between lightweight HTTP scraping and full browser automation without changing a single line.

```python
from crawlix import Browser

# Zero-setup scraping
with Browser() as b:
    page = b.open("https://example.com")
    print(page.find("h1").text)

# Full browser automation — same API
with Browser(backend="playwright") as b:
    page = b.open("https://example.com")
    page.type("#email", "user@example.com")
    page.click("[type=submit]")
    page.wait_for(".dashboard")
    page.screenshot("result.png")
```

---

## Install

```bash
pip install crawlix                    # core — requests + BeautifulSoup
pip install crawlix[playwright]        # full browser via Playwright
pip install crawlix[selenium]          # full browser via Selenium
pip install crawlix[async]             # async support via httpx
pip install crawlix[full]              # everything above
pip install crawlix[termux]           # for Termux/Android (no Playwright)
```

> [!TIP]
> After installing a browser backend, run `crawlix setup all` to automatically download browsers and drivers.

### CLI Tools

```bash
crawlix setup playwright   # Install Playwright + Chromium browser
crawlix setup selenium     # Install Selenium (driver auto-managed)
crawlix setup all          # Install everything
crawlix doctor             # Check system & diagnose issues
```

---

## Why crawlix?

| Problem | Solution |
|---------|----------|
| Rewriting code when switching from HTTP to browser scraping | **Same API** — change `backend=` not your code |
| Heavy dependencies for small tasks | **Zero hard deps** — core uses only requests + bs4 |
| Bot detection blocking your scrapers | **Stealth by default** — realistic headers, UA rotation |
| Remembering which backend does what | **Auto-detect** — picks the best available backend |
| Confusing error messages | **Helpful errors** — `BackendError` tells you exactly what to install |

```python
# Auto-detect picks the best backend installed on your system
# Priority: playwright > selenium > requests+bs4
with Browser() as b:
    print(b.backend_name)  # "requests" — or "playwright" if installed
```

---

## Quick Start

### Scrape a page

```python
from crawlix import Browser

with Browser() as b:
    page = b.open("https://news.ycombinator.com")
    for item in page.find_all(".titleline > a"):
        print(item.text, item.attr("href"))
```

### Extract data from APIs

```python
from crawlix import get, fetch

data = get("https://api.github.com/users/keyreyla").json()
html = fetch("https://example.com")
```

### Automate a login flow

```python
with Browser(backend="playwright") as b:
    b.open("https://github.com/login")
    b.type("#login_field", "username")
    b.type("#password", "password")
    b.click("[type=submit]")
    b.wait_for(".dashboard-sidebar")
    print("Logged in:", b.url)
```

### Async usage

```python
import asyncio
from crawlix.async_api import AsyncBrowser, aget

async def main():
    async with AsyncBrowser() as b:
        page = await b.open("https://example.com")
        print(page.title)

    page = await aget("https://api.github.com/users/keyreyla")
    print(page.url)

asyncio.run(main())
```

---

## API at a Glance

### Browser

```python
Browser(
    backend="auto",   # "playwright", "selenium", "requests", "httpx"
    headless=True,
    stealth=True,
    timeout=30,
    proxy=None,       # "http://user:pass@host:port"
    locale="en-US",
    user_agent=None,
)

b.open(url)          # -> Page
b.new_page()         # -> Page
b.close()
b.backend_name       # -> str
b.supports_js        # -> bool
```

### Page

All interaction methods return `self` for chaining:

```python
page.find(selector)           # -> Element | None
page.find_all(selector)       # -> list[Element]
page.click(selector)          # -> Page
page.type(selector, text)     # -> Page
page.wait_for(selector)       # -> Page
page.screenshot(path=None)    # -> bytes
page.html                     # -> str
page.text                     # -> str
page.json()                   # -> dict
page.links()                  # -> list[str]
page.tables()                 # -> list[list[list[str]]]
page.evaluate("document.title")  # -> any
```

### Element

```python
el.text               # -> str
el.attr(name)         # -> str
el.attrs              # -> dict
el.find(selector)     # -> Element | None
el.click()            # -> Element
el.is_visible()       # -> bool
el.bounding_box()     # -> dict
if el:                # always True — natural presence checks
    ...
```

---

## Backend Feature Matrix

| Feature | requests | playwright | selenium | httpx |
|---------|:--------:|:----------:|:--------:|:-----:|
| JS execution | | ✅ | ✅ | |
| Click/type/hover | | ✅ | ✅ | |
| Screenshot/PDF | | ✅ | ✅ | |
| Network intercept | | ✅ | | |
| Async | | ✅ | | ✅ |
| Wait/retry | | ✅ | ✅ | |
| File upload | | ✅ | ✅ | |
| Proxy | ✅ | ✅ | ✅ | ✅ |

---

## Examples

<details>
<summary><strong>Proxy</strong></summary>

```python
with Browser(proxy="http://user:pass@proxy:8080") as b:
    page = b.open("https://ipinfo.io/json")
    print(page.json()["ip"])
```
</details>

<details>
<summary><strong>Table extraction</strong></summary>

```python
with Browser() as b:
    page = b.open("https://en.wikipedia.org/wiki/Python_(programming_language)")
    for row in page.tables()[0]:
        print(row)
```
</details>

<details>
<summary><strong>File upload</strong></summary>

```python
with Browser(backend="playwright") as b:
    page = b.open("https://example.com/upload")
    page.upload("#file-input", "/path/to/file.pdf")
    page.click("#submit")
    page.wait_for(".success")
```
</details>

<details>
<summary><strong>Network intercept</strong></summary>

```python
with Browser(backend="playwright") as b:
    page = b.open("https://example.com")
    page.intercept("**/api/**", lambda req: print(req.url))
```
</details>

---

## Exceptions

```python
from crawlix.exceptions import (
    CrawlixError,       # base — catch-all
    BackendError,       # backend unavailable or op not supported
    TimeoutError,       # wait exceeded timeout
    NavigationError,    # page failed to load
    SelectorError,      # invalid selector or element not found
    NetworkError,       # connection error
    JavaScriptError,    # JS evaluation failed
)
```

> [!NOTE]
> `BackendError` always includes an install hint. For example, calling `screenshot()` on the `requests` backend raises: `BackendError: screenshot() requires a browser backend. Install: pip install crawlix[playwright]`

---

---

## Examples

Ready-to-run scripts in [`examples/`](examples/):

| Example | Platform | Run command |
|---------|----------|-------------|
| [`scrape_news.py`](examples/scrape_news.py) | PC + Android | `python examples/scrape_news.py` |
| [`android_scraper.py`](examples/android_scraper.py) | Android (Termux) | `pip install crawlix[termux] && python examples/android_scraper.py` |
| [`browser_login.py`](examples/browser_login.py) | PC | `pip install crawlix[playwright] && playwright install chromium && python examples/browser_login.py` |
| [`async_scraper.py`](examples/async_scraper.py) | PC + Android | `pip install crawlix[async] && python examples/async_scraper.py` |

---

## Development

```bash
git clone https://github.com/keyreyla/crawlix.git
python -m venv .venv && source .venv/bin/activate
pip install -e ".[full]"
pip install pytest ruff mypy
pytest
```
