Metadata-Version: 2.4
Name: crawlix
Version: 0.3.0
Summary: One API. Any backend. Full browser automation to lightweight scraping.
Project-URL: Homepage, https://github.com/keyreyla/crawlix
Project-URL: Documentation, https://keyreyla.github.io/crawlix
Project-URL: Source, https://github.com/keyreyla/crawlix
Project-URL: Issues, https://github.com/keyreyla/crawlix/issues
Project-URL: PyPI, https://pypi.org/project/crawlix
Author: keylordelrey
License: MIT
License-File: LICENSE
Keywords: browser-automation,playwright,selenium,web-scraping
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: beautifulsoup4>=4.12
Requires-Dist: requests>=2.28
Provides-Extra: async
Requires-Dist: httpx>=0.24; extra == 'async'
Provides-Extra: dev
Requires-Dist: hatchling>=1.21; extra == 'dev'
Requires-Dist: mypy>=1.8; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: responses>=0.25; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Requires-Dist: types-pyyaml>=6.0; extra == 'dev'
Provides-Extra: full
Requires-Dist: httpx>=0.24; extra == 'full'
Requires-Dist: playwright>=1.40; extra == 'full'
Requires-Dist: selenium>=4.15; extra == 'full'
Provides-Extra: mobile
Requires-Dist: httpx>=0.24; extra == 'mobile'
Requires-Dist: selenium>=4.15; extra == 'mobile'
Provides-Extra: playwright
Requires-Dist: playwright>=1.40; extra == 'playwright'
Provides-Extra: selenium
Requires-Dist: selenium>=4.15; extra == 'selenium'
Provides-Extra: termux
Requires-Dist: httpx>=0.24; extra == 'termux'
Requires-Dist: selenium>=4.15; extra == 'termux'
Description-Content-Type: text/markdown

[![PyPI](https://img.shields.io/pypi/v/crawlix?style=flat-square&logo=python&logoColor=white)](https://pypi.org/project/crawlix/)
[![Python](https://img.shields.io/pypi/pyversions/crawlix?style=flat-square)](https://pypi.org/project/crawlix/)
[![Build](https://img.shields.io/github/actions/workflow/status/keyreyla/crawlix/test.yml?style=flat-square)](https://github.com/keyreyla/crawlix/actions)
[![Docs](https://img.shields.io/badge/docs-github--pages-8B5CF6?style=flat-square)](https://keyreyla.github.io/crawlix)
[![License](https://img.shields.io/pypi/l/crawlix?style=flat-square)](LICENSE)

# crawlix

> One API. Any backend. Full browser automation to lightweight scraping.

crawlix is a Python browser automation and web scraping library with a unified API across multiple backends. Write your code once — switch between lightweight HTTP scraping and full browser automation without changing a single line.

```python
from crawlix import Browser

# Zero-setup scraping — auto-detects best backend
with Browser() as b:
    page = b.open("https://example.com")
    print(page.find("h1").text)

# Full browser automation — same exact API
with Browser(backend="playwright") as b:
    page = b.open("https://example.com")
    page.type("#email", "user@example.com")
    page.click("[type=submit]")
    page.wait_for(".dashboard")
    page.screenshot("result.png")
```

---

## Install

```bash
pip install crawlix                    # core (requests + BeautifulSoup)
pip install crawlix[playwright]        # full browser via Playwright
pip install crawlix[selenium]          # full browser via Selenium
pip install crawlix[async]             # async support via httpx
pip install crawlix[full]              # everything above
pip install crawlix[termux]            # Termux/Android (no Playwright)
```

> [!TIP]
> After installing a browser backend, run `crawlix setup all` to automatically download browsers and drivers.

```bash
crawlix setup playwright   # install Playwright + Chromium
crawlix setup selenium     # install Selenium (drivers auto-managed)
crawlix setup all          # install everything
crawlix doctor             # check system and diagnose issues
```

---

## Features

| | Feature | Details |
|---|---|---|
| **Unified API** | Same code for HTTP scraping and browser automation. Change `backend=`, not your code. |
| **Auto-detect** | Picks the best available backend: playwright → selenium → httpx → requests. No config needed. |
| **Zero hard deps** | Core depends only on `requests` + `beautifulsoup4`. Backends are optional extras. |
| **Stealth by default** | Realistic headers, user-agent rotation, no bot fingerprinting out of the box. |
| **Context manager** | Resources cleaned up automatically. Works with `with` or async context managers. |
| **Helpful errors** | `BackendError` tells you exactly what to install. No silent failures, no traceback soup. |
| **CLI tools** | `crawlix setup` installs backends. `crawlix doctor` diagnoses your environment. |
| **Termux ready** | Works on Android via Termux. Use `pip install crawlix[termux]`. |

---

## Quick start

```python
from crawlix import Browser

# Detect backends lazily
with Browser() as b:
    page = b.open("https://news.ycombinator.com")
    for item in page.find_all(".titleline > a"):
        print(item.text, item.attr("href"))
```

```python
from crawlix import get, fetch

data = get("https://api.github.com/users/keyreyla").json()
html = fetch("https://example.com")
```

```python
import asyncio
from crawlix.async_api import AsyncBrowser

async def main():
    async with AsyncBrowser() as b:
        page = await b.open("https://example.com")
        print(page.title)

asyncio.run(main())
```

> [!NOTE]
> See more examples in the [`examples/`](examples/) directory, including browser login flows, table extraction, file uploads, and Android scraping scripts.

---

## API at a glance

### Browser

```python
Browser(
    backend="auto",   # "playwright" | "selenium" | "requests" | "httpx"
    headless=True,
    stealth=True,
    timeout=30,
    proxy=None,       # "http://user:pass@host:port"
    locale="en-US",
)

b.open(url)          # -> Page
b.new_page()         # -> Page
b.close()
b.backend_name       # -> str
b.supports_js        # -> bool
```

### Page

All interaction methods return `self` for chaining:

```python
page.find(selector)           # -> Element | None
page.find_all(selector)       # -> list[Element]
page.click(selector)          # -> Page (chainable)
page.type(selector, text)     # -> Page (chainable)
page.wait_for(selector)       # -> Page (chainable)
page.screenshot(path=None)    # -> bytes
page.html                     # -> str
page.text                     # -> str
page.json()                   # -> dict
page.links()                  # -> list[str]
page.tables()                 # -> list[list[list[str]]]
page.evaluate("document.title")  # -> any (browser backends)
```

### Element

```python
el.text               # -> str
el.attr(name)         # -> str
el.attrs              # -> dict
el.find(selector)     # -> Element | None
el.click()            # -> Element (chainable)
el.is_visible()       # -> bool
el.bounding_box()     # -> dict
if el:                # always True — natural presence checks
    ...
```

### Exceptions

```python
from crawlix.exceptions import CrawlixError, BackendError, TimeoutError
```

---

## Backend feature matrix

| Feature | requests | httpx | selenium | playwright |
|---|---|---|---|---|
| JS execution | | | yes | yes |
| Click / type / hover | | | yes | yes |
| Screenshot / PDF | | | yes | yes |
| Network intercept | | | | yes |
| Async support | | yes | | yes |
| Wait / retry | | | yes | yes |
| File upload | | | yes | yes |
| Proxy support | yes | yes | yes | yes |

---

## Get help

- **Docs**: [keyreyla.github.io/crawlix](https://keyreyla.github.io/crawlix)
- **Issues**: [github.com/keyreyla/crawlix/issues](https://github.com/keyreyla/crawlix/issues)
- **PyPI**: [pypi.org/project/crawlix](https://pypi.org/project/crawlix)
