Metadata-Version: 2.4
Name: dawg-baas
Version: 0.2.1
Summary: Python SDK for BaaS (Browser as a Service) - managed headless browsers and HTTP scraping via CDP
Project-URL: Homepage, https://github.com/dawgswarm/dawg_baas
Project-URL: Repository, https://github.com/dawgswarm/dawg_baas
Project-URL: Issues, https://github.com/dawgswarm/dawg_baas/issues
Author: DAWG Team
License-Expression: MIT
License-File: LICENSE
Keywords: automation,baas,browser,cdp,headless,playwright,scraping
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Internet :: WWW/HTTP :: Browsers
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Requires-Dist: httpx>=0.25.0
Requires-Dist: requests>=2.28.0
Provides-Extra: dev
Requires-Dist: mypy>=1.0.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Description-Content-Type: text/markdown

# dawg-baas

Python SDK for BaaS (Browser as a Service).

Two tools in one SDK:
- **Baas** — cloud browser via CDP WebSocket (Playwright, Puppeteer, Selenium)
- **Scraper** — fast HTTP scraping with content extraction (no browser needed)

## Installation

```bash
pip install dawg-baas
```

## Scraper — HTTP scraping

Extract clean content from web pages without a browser. Fast, cheap, TLS-fingerprinted.

```python
from dawg_baas import Scraper

with Scraper(api_key="your_key") as s:
    # Single page → markdown
    result = s.scrape("https://example.com")
    print(result.content)

    # Crawl a site
    job = s.crawl("https://example.com", max_depth=2, max_pages=20)
    job.wait()
    for page in job.pages:
        print(page.url, len(page.content))

    # Batch scrape
    job = s.batch(["https://a.com", "https://b.com"])
    job.wait()
```

### Scraper Methods

- `scrape(url, format="markdown", main_content=False, include_links=False)` → `ScrapeResult`
- `crawl(url, max_depth=2, max_pages=50, concurrency=3)` → `ScrapeJob`
- `batch(urls, concurrency=5)` → `ScrapeJob`
- `get_job(job_id)` → `ScrapeJob`
- `cancel_job(job_id)`

Formats: `"markdown"`, `"text"`, `"html"`

Jobs (crawl/batch) are async — use `job.wait()` to block until done, or `job.refresh()` to poll manually.

## Browser — CDP access

Get a cloud browser via WebSocket. Use with any automation framework.

```python
from dawg_baas import Baas

with Baas(api_key="your_key") as ws_url:
    browser = playwright.chromium.connect_over_cdp(ws_url)
    # ... your code ...
# auto-released
```

### With Proxy

```python
baas = Baas(api_key="your_key")
ws_url = baas.create(proxy="socks5://user:pass@host:port")
```

### Async

```python
from dawg_baas import AsyncBaas

async with AsyncBaas(api_key="your_key") as ws_url:
    browser = await playwright.chromium.connect_over_cdp(ws_url)
```

### Browser Methods

- `create(proxy=None, geo=None) -> str` — returns `ws_url`
- `release()` — release browser back to pool
- `close()` — close HTTP session

## Exceptions

```python
from dawg_baas import BaasError, AuthError, RateLimitError

try:
    result = scraper.scrape("https://example.com")
except AuthError:
    print("Invalid API key")
except RateLimitError as e:
    print(f"Rate limit, retry after {e.retry_after}s")
```

## Examples

### Scrape to markdown

```python
from dawg_baas import Scraper

s = Scraper(api_key="your_key")
result = s.scrape("https://news.ycombinator.com", format="markdown", main_content=True)
print(result.metadata["title"])
print(result.content)
s.close()
```

### Playwright browser

```python
from playwright.sync_api import sync_playwright
from dawg_baas import Baas

with Baas(api_key="your_key") as ws_url:
    with sync_playwright() as p:
        browser = p.chromium.connect_over_cdp(ws_url)
        page = browser.contexts[0].pages[0]
        page.goto("https://example.com")
        print(page.title())
        browser.close()
```

## License

MIT
