Metadata-Version: 2.4
Name: llama-index-readers-skim
Version: 0.1.0
Summary: LlamaIndex reader for Skim — clean web reader for AI agents. Pays $0.002/call in USDC over x402. No signup, no API keys.
Project-URL: Homepage, https://skim402.com
Project-URL: Documentation, https://skim402.com/docs
Project-URL: Repository, https://github.com/JessieJanie/skim402
Project-URL: x402 protocol, https://x402.org
Author-email: Skim <hello@skim402.com>
License: MIT
License-File: LICENSE
Keywords: agent,ai,llama-index,llamaindex,llm,markdown,rag,reader,skim,web-scraping,x402
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: eth-account>=0.13.0
Requires-Dist: llama-index-core>=0.11.0
Requires-Dist: requests>=2.31.0
Requires-Dist: x402[evm]>=2.0.0
Provides-Extra: test
Requires-Dist: pytest>=8.0; extra == 'test'
Description-Content-Type: text/markdown

# llama-index-readers-skim

**Give your LlamaIndex pipeline the ability to read any URL — clean Markdown, no ads, no nav, no boilerplate. Pays itself per call. No signup, no API key.**

[![PyPI version](https://img.shields.io/pypi/v/llama-index-readers-skim.svg)](https://pypi.org/project/llama-index-readers-skim/)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)

`llama-index-readers-skim` is the official [LlamaIndex](https://docs.llamaindex.ai) reader for [Skim](https://skim402.com) — the canonical [x402](https://x402.org) clean reader API. It exposes one reader, `SkimReader`, that turns any web page into a LlamaIndex `Document` of agent-ready Markdown plus structured metadata (title, byline, published date, language, excerpt). Each call costs **$0.002 in USDC on Base**, paid automatically by your local wallet over HTTP 402.

---

## Install

```bash
pip install llama-index-readers-skim
```

This pulls in the x402 client with EVM support, so there's nothing else to install.

---

## Quickstart (60 seconds)

### 1. Fund a Base wallet with $1 of USDC

A dollar funds roughly 500 reads. Full step-by-step (with screenshots, for non-crypto-native devs): **<https://skim402.com/wallet>**.

> **Use a fresh wallet, not your personal one.** This wallet's private key signs payment authorizations on your machine — treat it like a hot wallet for paying $0.002 tolls, not a savings account.

### 2. Point the reader at your wallet

```bash
export SKIM_WALLET_PRIVATE_KEY=0xYOUR_BASE_WALLET_PRIVATE_KEY
```

### 3. Use it

```python
from llama_index.readers.skim import SkimReader

reader = SkimReader()  # reads SKIM_WALLET_PRIVATE_KEY from the environment

documents = reader.load_data(urls=["https://en.wikipedia.org/wiki/HTTP_402"])
print(documents[0].text)
print(documents[0].metadata)
```

The reader signs an EIP-3009 USDC authorization for $0.002, Skim returns clean Markdown, and you get back a `Document` with the article body as `text` and the page metadata in `metadata`. The payment shows up in your wallet's transaction history on [BaseScan](https://basescan.org/).

---

## Build an index from web pages

`SkimReader` returns standard LlamaIndex `Document` objects, so it drops straight into any ingestion pipeline:

```python
from llama_index.core import VectorStoreIndex
from llama_index.readers.skim import SkimReader

reader = SkimReader()
documents = reader.load_data(
    urls=[
        "https://example.com/article-one",
        "https://example.com/article-two",
    ]
)

index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
print(query_engine.query("What do these articles have in common?"))
```

Each URL costs one $0.002 read, paid automatically as the documents load.

---

## Output shape

`load_data` returns a list of `Document` objects. Each `Document` has:

- **`text`** — the cleaned article body in Markdown.
- **`metadata`** — a dict with the source URL plus the page metadata Skim extracted:

```python
{
    "source": "https://example.com/article",
    "title": "Example article",
    "byline": "Jane Doe",
    "publishedAt": "2025-01-15",
    "lang": "en",
    "excerpt": "A short summary...",
}
```

Empty and `None` metadata values are dropped. Set `include_metadata=False` to keep only the `source` URL.

---

## Configuration

`SkimReader` takes the following parameters (all optional except the wallet key):

| Parameter          | Default                    | Notes                                                                                                                           |
| ------------------ | -------------------------- | ----------------------------------------------------------------------------------------------------------------------------- |
| `private_key`      | `$SKIM_WALLET_PRIVATE_KEY` | Hex private key for the Base wallet that pays for reads. With or without `0x`. Use a dedicated wallet — never your personal one. |
| `base_url`         | `https://skim402.com`      | Override the API base URL. For self-hosting or local development.                                                              |
| `max_price_usd`    | `0.01`                     | Hard cap on per-call price in USD. The wallet refuses to sign for anything above this. Skim is `$0.002`/call.                  |
| `include_metadata` | `True`                     | Populate each `Document`'s `metadata` with the page metadata Skim returns.                                                     |
| `timeout`          | `60`                       | Per-request timeout in seconds.                                                                                                |

```python
reader = SkimReader(
    private_key="0x...",       # or rely on the env var
    max_price_usd=0.005,
    include_metadata=False,
)
```

---

## How it actually works

```
your pipeline ──► SkimReader ──► POST https://skim402.com/api/v1/read
                     ▲                       │
                     │                       ▼
                     │              402 Payment Required
                     │                  (x402 challenge)
                     │                       │
                     ▼                       │
      x402 signs EIP-3009 USDC ◄─────────────┘
      transfer authorization (locally)
                     │
                     ▼
           retry POST with X-PAYMENT header
                     │
                     ▼
      Skim verifies + settles via Coinbase CDP facilitator
                     │
                     ▼
           200 OK + clean Markdown
```

Your private key never leaves your machine — it only signs authorizations locally.

---

## Security

- **Dedicated wallet, always.** Fund it with only as much USDC as you're willing to spend in a runaway loop. The `max_price_usd` cap catches accidental price escalations.
- **No outbound telemetry from this package.** `llama-index-readers-skim` only talks to `skim402.com` (or whatever you set as `base_url`). No analytics, no error reporting, no phone-home.

---

## Try it without a pipeline

Skeptical? Test the upstream endpoint directly — it'll return a 402 challenge so you can see the protocol in action:

```bash
curl -i -X POST https://skim402.com/api/v1/read \
  -H 'content-type: application/json' \
  -d '{"url":"https://en.wikipedia.org/wiki/HTTP_402"}'
```

You'll get back `HTTP/1.1 402 Payment Required` with the x402 challenge in the response body.

---

## Links

- **Skim website** — <https://skim402.com>
- **Wallet setup guide** — <https://skim402.com/wallet>
- **API docs** — <https://skim402.com/docs>
- **x402 protocol** — <https://x402.org>
- **GitHub** — <https://github.com/JessieJanie/skim402>

---

## License

MIT
