Metadata-Version: 2.4
Name: skim-haystack
Version: 0.1.0
Summary: Haystack integration for Skim — clean web reader for AI agents. Pays $0.002/call in USDC over x402. No signup, no API keys.
Project-URL: Homepage, https://skim402.com
Project-URL: Documentation, https://skim402.com/docs
Project-URL: Repository, https://github.com/JessieJanie/skim402
Project-URL: x402 protocol, https://x402.org
Author-email: Skim <hello@skim402.com>
License: MIT
License-File: LICENSE
Keywords: agent,ai,deepset,haystack,haystack-ai,llm,markdown,rag,reader,skim,web-scraping,x402
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: eth-account>=0.13.0
Requires-Dist: haystack-ai>=2.0.0
Requires-Dist: requests>=2.31.0
Requires-Dist: x402[evm]>=2.0.0
Provides-Extra: test
Requires-Dist: pytest>=8.0; extra == 'test'
Description-Content-Type: text/markdown

# skim-haystack

**Give your Haystack pipelines the ability to read any URL — clean Markdown, no ads, no nav, no boilerplate. Pays itself per call. No signup, no API key.**

[![PyPI version](https://img.shields.io/pypi/v/skim-haystack.svg)](https://pypi.org/project/skim-haystack/)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)

`skim-haystack` is the official [Haystack](https://haystack.deepset.ai) integration for [Skim](https://skim402.com) — the canonical [x402](https://x402.org) clean reader API. It provides one component, `SkimReader`, that fetches any web page and returns it as a Haystack `Document` (clean Markdown in `content`, structured metadata in `meta`). Each call costs **$0.002 in USDC on Base**, paid automatically by your local wallet over HTTP 402.

---

## Install

```bash
pip install skim-haystack
```

This pulls in the x402 client with EVM support, so there's nothing else to install.

---

## Quickstart (60 seconds)

### 1. Fund a Base wallet with $1 of USDC

A dollar funds roughly 500 reads. Full step-by-step (with screenshots, for non-crypto-native devs): **<https://skim402.com/wallet>**.

> **Use a fresh wallet, not your personal one.** This wallet's private key signs payment authorizations on your machine — treat it like a hot wallet for paying $0.002 tolls, not a savings account.

### 2. Point the component at your wallet

```bash
export SKIM_WALLET_PRIVATE_KEY=0xYOUR_BASE_WALLET_PRIVATE_KEY
```

### 3. Use it

```python
from skim_haystack import SkimReader

reader = SkimReader()  # reads SKIM_WALLET_PRIVATE_KEY from the environment

result = reader.run(urls="https://en.wikipedia.org/wiki/HTTP_402")
print(result["documents"][0].content)
```

The component signs an EIP-3009 USDC authorization for $0.002, Skim returns clean Markdown, and you get back a `Document` with the article body in `content` and metadata in `meta`. The payment shows up in your wallet's transaction history on [BaseScan](https://basescan.org/).

---

## Use it in a pipeline

`SkimReader` is a standard Haystack component, so it drops straight into a `Pipeline`. Here it fetches a page and feeds the cleaned Markdown into a prompt:

```python
from haystack import Pipeline
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from skim_haystack import SkimReader

pipe = Pipeline()
pipe.add_component("reader", SkimReader())
pipe.add_component("prompt", PromptBuilder(
    template="Summarize this article in 5 bullets:\n\n{{ documents[0].content }}"
))
pipe.add_component("llm", OpenAIGenerator(model="gpt-4o-mini"))

pipe.connect("reader.documents", "prompt.documents")
pipe.connect("prompt.prompt", "llm.prompt")

result = pipe.run({"reader": {"urls": "https://en.wikipedia.org/wiki/HTTP_402"}})
print(result["llm"]["replies"][0])
```

The wallet pays per read, and your pipeline gets clean Markdown instead of raw HTML.

---

## Output shape

`SkimReader.run(...)` returns `{"documents": [Document, ...]}` — one `Document` per URL:

- `Document.content` — the cleaned article body in Markdown.
- `Document.meta` — always includes `source` (the URL), plus page metadata (`title`, `byline`, `publishedAt`, `lang`, `excerpt`, ...) unless `include_metadata=False`.

---

## Configuration

`SkimReader` takes the following parameters (all optional except the wallet key):

| Parameter          | Default                                              | Notes                                                                                                                          |
| ------------------ | --------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------- |
| `private_key`      | `Secret.from_env_var("SKIM_WALLET_PRIVATE_KEY")`    | A Haystack `Secret` with the Base wallet's hex private key. With or without `0x`. Pass `Secret.from_token("0x...")` for an explicit key. Use a dedicated wallet — never your personal one. |
| `base_url`         | `https://skim402.com`                               | Override the API base URL. For self-hosting or local development.                                                             |
| `max_price_usd`    | `0.01`                                              | Hard cap on per-call price in USD. The wallet refuses to sign for anything above this. Skim is `$0.002`/call.                 |
| `include_metadata` | `True`                                              | Populate each `Document`'s `meta` with page metadata.                                                                         |
| `timeout`          | `60`                                                | Per-request timeout in seconds.                                                                                               |

```python
from haystack.utils import Secret
from skim_haystack import SkimReader

reader = SkimReader(
    private_key=Secret.from_token("0x..."),  # or rely on the env var
    max_price_usd=0.005,
    include_metadata=False,
)
```

The component supports pipeline serialization (`to_dict`/`from_dict`). When the key comes from an environment variable (the default, or any `Secret.from_env_var(...)`), it is stored as a *reference* to that variable name — never the raw value. An inline `Secret.from_token("0x...")` is intentionally runtime-only: Haystack refuses to serialize token-backed secrets, so it will never be written to disk.

---

## How it actually works

```
your pipeline ──► SkimReader ──► POST https://skim402.com/api/v1/read
                     ▲                       │
                     │                       ▼
                     │              402 Payment Required
                     │                  (x402 challenge)
                     │                       │
                     ▼                       │
      x402 signs EIP-3009 USDC ◄─────────────┘
      transfer authorization (locally)
                     │
                     ▼
           retry POST with X-PAYMENT header
                     │
                     ▼
      Skim verifies + settles via Coinbase CDP facilitator
                     │
                     ▼
           200 OK + clean Markdown
```

Your private key never leaves your machine — it only signs authorizations locally.

---

## Security

- **Dedicated wallet, always.** Fund it with only as much USDC as you're willing to spend in a runaway loop. The `max_price_usd` cap catches accidental price escalations.
- **No outbound telemetry from this package.** `skim-haystack` only talks to `skim402.com` (or whatever you set as `base_url`). No analytics, no error reporting, no phone-home.

---

## Try it without a pipeline

Skeptical? Test the upstream endpoint directly — it'll return a 402 challenge so you can see the protocol in action:

```bash
curl -i -X POST https://skim402.com/api/v1/read \
  -H 'content-type: application/json' \
  -d '{"url":"https://en.wikipedia.org/wiki/HTTP_402"}'
```

You'll get back `HTTP/1.1 402 Payment Required` with the x402 challenge in the response body.

---

## Links

- **Skim website** — <https://skim402.com>
- **Wallet setup guide** — <https://skim402.com/wallet>
- **API docs** — <https://skim402.com/docs>
- **x402 protocol** — <https://x402.org>
- **GitHub** — <https://github.com/JessieJanie/skim402>

---

## License

MIT
