Metadata-Version: 2.4
Name: plasmate
Version: 0.4.1
Summary: Agent-native headless browser. HTML in, Semantic Object Model out.
Author-email: Plasmate Labs <hello@plasmate.app>
License-Expression: Apache-2.0
Project-URL: Homepage, https://plasmate.app
Project-URL: Repository, https://github.com/plasmate-labs/plasmate
Project-URL: Documentation, https://docs.plasmate.app
Keywords: browser,headless,scraping,ai,llm,agent,mcp,semantic
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Internet :: WWW/HTTP :: Browsers
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: pydantic>=2.0

<!-- mcp-name: io.github.plasmate-labs/plasmate -->
# plasmate

Agent-native headless browser for Python. HTML in, Semantic Object Model out.

## Install

```bash
pip install plasmate
```

Requires the `plasmate` binary in your PATH:

```bash
curl -fsSL https://plasmate.app/install.sh | sh
```

## Quick Start

```python
from plasmate import Plasmate

browser = Plasmate()

# Fetch a page as a structured Semantic Object Model
som = browser.fetch_page("https://news.ycombinator.com")
print(f"{som['title']}: {len(som['regions'])} regions")

# Extract clean text only
text = browser.extract_text("https://example.com")
print(text)

# Interactive browsing
session = browser.open_page("https://example.com")
print(session["session_id"], session["som"]["title"])

title = browser.evaluate(session["session_id"], "document.title")
print(title)

browser.close_page(session["session_id"])
browser.close()
```

### Async

```python
from plasmate import AsyncPlasmate

async with AsyncPlasmate() as browser:
    som = await browser.fetch_page("https://example.com")
    print(som["title"])
```

### Context Manager

```python
with Plasmate() as browser:
    som = browser.fetch_page("https://example.com")
    # Process closes automatically
```

## API

### `Plasmate(binary="plasmate", timeout=30)`

| Param | Type | Default | Description |
|-------|------|---------|-------------|
| `binary` | `str` | `"plasmate"` | Path to the plasmate binary |
| `timeout` | `float` | `30` | Response timeout in seconds |

### Stateless (one-shot)

- **`fetch_page(url, *, budget=None, javascript=True)`** - Returns SOM dict
- **`extract_text(url, *, max_chars=None)`** - Returns clean text string

### Stateful (interactive sessions)

- **`open_page(url)`** - Returns dict with `session_id` and `som`
- **`evaluate(session_id, expression)`** - Run JS, get result
- **`click(session_id, element_id)`** - Click element, get updated SOM
- **`close_page(session_id)`** - Close session

### Lifecycle

- **`close()`** - Shut down the plasmate process

### Pydantic Models

Parse SOM responses into typed Pydantic v2 models:

```python
from plasmate import Plasmate, Som, find_interactive, find_by_text, flat_elements

browser = Plasmate()
data = browser.fetch_page("https://example.com")
som = Som(**data)

print(som.title)               # "Example Domain"
print(som.meta.element_count)  # 12

for region in som.regions:
    print(f"{region.role}: {len(region.elements)} elements")
```

### Query Helpers

Search and traverse SOM documents:

```python
from plasmate import Som, find_by_role, find_by_id, find_by_tag
from plasmate import find_interactive, find_by_text, flat_elements, get_token_estimate

# Find all navigation regions
navs = find_by_role(som, "navigation")

# Find a specific element
el = find_by_id(som, "e5")
if el:
    print(el.role, el.text)

# Find all links
links = find_by_tag(som, "link")

# Get all interactive elements (buttons, inputs, etc.)
for el in find_interactive(som):
    print(f"{el.id}: {el.role} - {el.text}")

# Search by text content (case-insensitive)
results = find_by_text(som, "sign up")

# Flatten all elements for iteration
all_elements = flat_elements(som)
print(f"{len(all_elements)} total elements")

# Estimate token usage
tokens = get_token_estimate(som)
print(f"~{tokens} tokens")
```

## How It Works

The SDK spawns `plasmate mcp` as a child process and communicates via JSON-RPC 2.0 over stdio. The plasmate binary handles HTML parsing, JavaScript execution (V8), and SOM compilation in Rust.

## License

Apache-2.0
