Metadata-Version: 2.4
Name: crawlix
Version: 0.1.0
Summary: One API. Any backend. Full browser automation to lightweight scraping.
Project-URL: Homepage, https://github.com/keylordelrey/crawlix
Project-URL: Source, https://github.com/keylordelrey/crawlix
Project-URL: Issues, https://github.com/keylordelrey/crawlix/issues
Author: keylordelrey
License: MIT
License-File: LICENSE
Keywords: browser-automation,playwright,selenium,web-scraping
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: beautifulsoup4>=4.12
Requires-Dist: requests>=2.28
Provides-Extra: async
Requires-Dist: httpx>=0.24; extra == 'async'
Provides-Extra: full
Requires-Dist: httpx>=0.24; extra == 'full'
Requires-Dist: playwright>=1.40; extra == 'full'
Requires-Dist: selenium>=4.15; extra == 'full'
Provides-Extra: playwright
Requires-Dist: playwright>=1.40; extra == 'playwright'
Provides-Extra: selenium
Requires-Dist: selenium>=4.15; extra == 'selenium'
Description-Content-Type: text/markdown

# crawlix

> One API. Any backend. Full browser automation to lightweight scraping.

**PyPI**: `pip install crawlix`
**Author**: keylordelrey
**License**: MIT
**Python**: 3.10+

---

## What crawlix is

crawlix is a Python browser automation and web scraping library with a unified API across multiple backends. The same code works whether you are doing simple HTTP scraping or full Playwright-powered browser automation — you switch backends, not code.

```python
from crawlix import Browser

with Browser() as b:
    page = b.open("https://example.com")
    print(page.find("h1").text)

with Browser(backend="playwright") as b:
    page = b.open("https://example.com")
    page.click("#login")
    page.type("#email", "user@example.com")
    page.submit("form")
    page.wait_for(".dashboard")
    page.screenshot("result.png")
```

---

## Install

```bash
pip install crawlix
pip install crawlix[playwright]
pip install crawlix[selenium]
pip install crawlix[async]
pip install crawlix[full]
```

---

## Core Design Rules

1. **Same API across all backends** — switching backend never requires rewriting user code
2. **Auto-detect best available backend** — no config needed, crawlix figures it out
3. **Zero hard dependencies** — `pip install crawlix` always succeeds
4. **Fail with helpful errors** — BackendError tells you exactly what to install
5. **Context manager always** — resources always cleaned up properly
6. **Stealth on by default** — realistic headers, UA rotation, no bot fingerprint

---

## Backend Priority

```
playwright > selenium > requests+bs4 (core)
```

Override anytime:
```python
Browser(backend="playwright")
Browser(backend="requests")
Browser(backend="selenium")
```

---

## Quick Examples

```python
from crawlix import Browser, get, fetch

with Browser() as b:
    page = b.open("https://news.ycombinator.com")
    for item in page.find_all(".titleline > a"):
        print(item.text, item.attr("href"))

data = get("https://api.github.com/users/keyreyla").json()
html = fetch("https://example.com")
```

For async:
```python
import asyncio
from crawlix.async_api import AsyncBrowser

async def main():
    async with AsyncBrowser() as b:
        page = await b.open("https://example.com")
        print(page.html)

asyncio.run(main())
```

---

## API Overview

### Browser
```python
Browser(backend="auto", headless=True, stealth=True, timeout=30, proxy=None, locale="en-US", user_agent=None)
b.open(url) -> Page
b.new_page() -> Page
b.close()
b.backend_name -> str
b.supports_js -> bool
```

### Page (all methods return `self` for chaining)
```python
page.find(selector) -> Element | None
page.find_all(selector) -> list[Element]
page.click(selector) -> Page
page.type(selector, text) -> Page
page.screenshot(path=None) -> bytes
page.html -> str
page.text -> str
page.json() -> dict
page.links() -> list[str]
```

### Element
```python
el.text -> str
el.attr(name) -> str
el.attrs -> dict
el.find(selector) -> Element | None
el.click() -> Element
bool(el)  # always True
```

---

## Exceptions

```python
from crawlix.exceptions import CrawlixError, BackendError, TimeoutError, NavigationError, SelectorError, NetworkError, JavaScriptError
```

---

## Development

```bash
git clone https://github.com/keyreyla/crawlix.git
cd crawlix
pip install -e ".[full]"
pytest
```
