Metadata-Version: 2.4
Name: jina-curl
Version: 0.1.0
Summary: Python library wrapping Jina AI Reader: convert URLs, search results, and files into LLM-friendly Markdown.
Project-URL: Homepage, https://github.com/kris-wang/jina-curl
Project-URL: Repository, https://github.com/kris-wang/jina-curl
Author-email: "kris.wang" <wenhom.wang@gmail.com>
License: MIT
Keywords: agent,jina,llm,markdown,reader,scraper
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: httpx>=0.27
Requires-Dist: platformdirs>=4.0
Requires-Dist: tenacity>=8.0
Requires-Dist: tomly>=0.1.4
Description-Content-Type: text/markdown

# jina-curl

A small Python library wrapping the [Jina AI](https://jina.ai) Reader APIs to turn
URLs, web searches, and local files into LLM-friendly Markdown (or text / HTML / JSON).
It adds consistent error handling, automatic retries (honouring `Retry-After`),
quota monitoring, and layered configuration on top of the raw HTTP API.

Sync (`JinaReader`) and async (`AsyncJinaReader`) clients expose the same surface.

## Install

```bash
uv add jina-curl        # or: pip install jina-curl
```

Requires Python ≥ 3.10. An API key is optional — calls fall back to anonymous
(rate-limited) access — but set one for higher limits (see [Configuration](#configuration)).

## Quick start

```python
from jina_curl import JinaReader

with JinaReader() as r:
    resp = r.read("https://example.com")   # r.jina.ai  — URL → Markdown
    print(resp.content)

    results = r.search("jina ai reader")   # s.jina.ai  — web search
    facts = r.ground("The Eiffel Tower is in Paris")  # g.jina.ai — fact-check

print(resp.title, resp.url, resp.usage.tokens_used)
```

Every call returns a `ReaderResponse` (`.content`, `.url`, `.title`, `.format`,
`.usage`, `.timestamp`, `.to_dict()`).

### Async

```python
import asyncio
from jina_curl import AsyncJinaReader

async def main() -> None:
    async with AsyncJinaReader() as r:
        results = await asyncio.gather(
            r.read("https://example.com"),
            r.read("https://example.org"),
        )

asyncio.run(main())
```

## Converting local content

Besides fetching URLs, both clients can POST local content to r.jina.ai for
conversion:

```python
with JinaReader() as r:
    # Raw HTML string (url is optional; helps resolve relative links)
    r.read_html("<h1>Hi</h1><p>...</p>", url="https://example.com")

    # Local file — dispatched by extension
    r.read_file("page.html")     # .html / .htm  → sent as HTML text
    r.read_file("report.pdf")    # .pdf          → base64
    r.read_file("deck.pptx")     # Office docs   → base64 (converted server-side)
```

`read_file` supports `.html`, `.htm`, `.pdf`, and MS Office documents
(`.docx`, `.doc`, `.xlsx`, `.xls`, `.pptx`, `.ppt`); other extensions raise
`ValueError`.

## Output formats & options

```python
from jina_curl import JinaReader, OutputFormat, ReaderOptions

with JinaReader() as r:
    r.read("https://example.com", fmt=OutputFormat.JSON)
    r.read(
        "https://example.com",
        options=ReaderOptions(no_cache=True, with_links_summary=True),
    )
```

`OutputFormat`: `MARKDOWN` (default), `TEXT`, `HTML`, `SCREENSHOT`, `PAGESHOT`, `JSON`.
`ReaderOptions` maps to Jina's `x-*` request headers (caching, selectors, link/image
summaries, engine, timeout, max tokens, locale, JSON schema, …); unset fields are omitted.

## Configuration

API key resolution (highest priority first):

1. `JinaReader(api_key=...)` argument
2. `JINA_API_KEY` environment variable
3. `~/.config/jina-curl/config.toml`
4. anonymous (fallback)

## Errors

All raise subclasses of `JinaError`: `AuthError` (401/403), `RateLimitError`
(429, carries `retry_after` / `quota_remaining`), `ApiError` (4xx/5xx), and
`ConfigError`. Retries cover `RateLimitError`, 5xx `ApiError`, and transport
errors; 4xx (non-429) are never retried.

## License

MIT
