Metadata-Version: 2.4
Name: ia-web-parser
Version: 0.1.0
Summary: Turn any AI web-chat provider into a tool-capable streaming API
Project-URL: Homepage, https://github.com/KevRojo/ia-web-parser
Project-URL: Repository, https://github.com/KevRojo/ia-web-parser
Project-URL: Issues, https://github.com/KevRojo/ia-web-parser/issues
Project-URL: Falcon, https://github.com/KevRojo/Falcon
Author: KevRojo
License: MIT
Keywords: agent,ai,automation,claude,deepseek,gemini,kimi,llm,playwright,qwen,tool-calling,web-scraping
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: requests>=2.28.0
Provides-Extra: dev
Requires-Dist: black>=23.0; extra == 'dev'
Requires-Dist: mypy>=1.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Provides-Extra: harvester
Requires-Dist: playwright>=1.40.0; extra == 'harvester'
Description-Content-Type: text/markdown

# ia-web-parser

> **Turn any AI web-chat into a tool-capable streaming API.**
> Harvest your existing browser session once. Chat, call tools, parse responses — without an API key.

[![PyPI](https://img.shields.io/pypi/v/ia-web-parser.svg?color=ff6b1f&labelColor=07070a&label=pypi)](https://pypi.org/project/ia-web-parser/)
[![Python](https://img.shields.io/pypi/pyversions/ia-web-parser.svg?color=ff6b1f&labelColor=07070a)](https://pypi.org/project/ia-web-parser/)
[![License](https://img.shields.io/badge/license-MIT-ff6b1f?labelColor=07070a)](LICENSE)

---

## What it does

Most AI providers charge for API access while their **web UIs are free with a subscription you already have**.
`ia-web-parser` bridges that gap. It:

1. **Harvests** your authenticated session via Playwright (you log in once in a real Chrome window).
2. **Streams** chat responses back to your code with the same Bearer/cookie auth the browser uses.
3. **Injects tool manifests** into the prompt and **parses `<tool_call>` blocks** out of the model output, giving you the same tool-calling loop a native API would — but for providers that don't expose one publicly.

Built originally for [**Falcon**](https://github.com/KevRojo/Falcon) (`pip install falcon-agent`) — the multi-provider AI CLI that pioneered tool-capable web providers — and now extracted as a standalone library so any Python project can use it.

## Supported providers

| Provider | Module | Endpoint |
|---|---|---|
| **Claude** | `claude-web` | claude.ai |
| **Kimi** | `kimi-web` | kimi.com |
| **Gemini** | `gemini-web` | gemini.google.com |
| **DeepSeek** | `deepseek-web` | chat.deepseek.com |
| **Qwen** | `qwen-web` | chat.qwen.ai |

## Install

```bash
pip install ia-web-parser[harvester]
playwright install chromium
```

The `[harvester]` extra pulls Playwright (only needed for the one-time login step).
For pure chat usage after harvesting, `pip install ia-web-parser` is enough.

## Quick start

### 1. Harvest auth (run once per provider)

```python
from ia_web_parser import SessionManager, ClaudeHarvester

sm = SessionManager()
ClaudeHarvester(sm).harvest()
# → opens a real Chrome window. Log in. Cookies + Bearer get stashed
#   to ~/.ia-web-parser/claude.json
```

### 2. Chat with tools

```python
from ia_web_parser import ClaudeWebProvider, SessionManager

sm = SessionManager()
provider = ClaudeWebProvider(session_manager=sm)

tools = [{
    "name": "Read",
    "description": "Read a file from disk",
    "parameters": {
        "type": "object",
        "properties": {"file_path": {"type": "string"}},
        "required": ["file_path"],
    },
}]

for chunk in provider.chat(
    messages=[{"role": "user", "content": "Read /etc/hosts and summarize"}],
    tool_schemas=tools,
):
    if hasattr(chunk, "text"):
        print(chunk.text, end="", flush=True)
    elif hasattr(chunk, "tool_calls"):
        for call in chunk.tool_calls:
            print(f"\n→ tool call: {call.name}({call.arguments})")
```

That's it. Same shape works for all five providers — swap `ClaudeWebProvider` for `KimiWebProvider`, `GeminiWebProvider`, etc.

## Architecture

```
ia_web_parser/
├── core/        # WebToolParser, history consolidation, manifest formatting
├── harvester/   # Playwright auth harvesters (one persistent profile per provider)
├── providers/   # Streaming chat clients
└── session.py   # Cookie + auth file + state management
```

Auth lives in `~/.ia-web-parser/`.
Persistent Chrome profiles live in `~/.ia-web-parser/profiles/<provider>/`.
Log in once. Re-runs reuse the same profile — never re-authenticate.

## Want the full agent experience?

`ia-web-parser` is the **library**. If you want the **whole CLI** — REPL, slash commands, sub-agents, MemPalace, voice, TTS, Mesa Redonda, 27 built-in tools, MCP, plugins — that's [Falcon](https://github.com/KevRojo/Falcon):

```bash
pip install falcon-agent
falcon
```

Falcon ships `ia-web-parser` integrated, plus 11 cloud APIs (Anthropic, OpenAI, Gemini, NVIDIA's free 14-model tier, Ollama, custom endpoints) and the full agentic loop.

## Why this exists

Web subscriptions you already pay for ($20/mo Claude Pro, $20/mo Gemini Advanced, etc.) give you flagship-model access. APIs charge again per token. `ia-web-parser` lets your scripts use what you've already paid for — programmatically, with tool calls, streaming, and persistent sessions.

It's the same trick `youtube-dl` pulled with video sites: meet the user where their auth already is.

## License

MIT. Fork it, ship it, build whatever.
