Metadata-Version: 2.4
Name: polite-http
Version: 0.1.0
Summary: A courteous, dependency-free HTTP client with rate limiting, retries, and backoff.
Project-URL: Homepage, https://github.com/doug/polite-http
Project-URL: Repository, https://github.com/doug/polite-http
Project-URL: Issues, https://github.com/doug/polite-http/issues
Project-URL: Changelog, https://github.com/doug/polite-http/blob/main/CHANGELOG.md
Author-email: Doug Fritz <dougfritz@gmail.com>
License-Expression: Apache-2.0
License-File: LICENSE
License-File: NOTICE
Keywords: backoff,client,http,rate-limiting,retry,throttling,urllib
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.9
Provides-Extra: dev
Requires-Dist: mypy>=1.8; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Description-Content-Type: text/markdown

# polite-http

[![CI](https://github.com/doug/polite-http/actions/workflows/ci.yml/badge.svg)](https://github.com/doug/polite-http/actions/workflows/ci.yml)
[![PyPI version](https://img.shields.io/pypi/v/polite-http.svg)](https://pypi.org/project/polite-http/)
[![Python versions](https://img.shields.io/pypi/pyversions/polite-http.svg)](https://pypi.org/project/polite-http/)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)

A courteous, **dependency-free** HTTP client for Python. It plays nice with
rate-limited APIs out of the box:

- **Per-host rate limiting** — cross-process, via a shared file lock.
- **Automatic retries** on transient errors (HTTP 429, 5xx) and network errors.
- **Exponential backoff** with optional jitter.
- **`Retry-After`** support (server-directed backoff, both seconds and
  HTTP-date forms).
- **`X-Throttling-Control`** proactive backpressure (as used by PubChem / NCBI).
- **Streaming** helpers for large line-oriented and binary responses.
- **Zero third-party dependencies** — built entirely on the standard library
  (`urllib.request`).

## Installation

```bash
pip install polite-http
```

Requires Python 3.9+. Cross-process rate limiting works on Linux, macOS, and
Windows — it uses `fcntl` on POSIX and `msvcrt` on Windows for the shared file
lock (both standard library). On the rare platform that provides neither, it
falls back to a best-effort in-process timer.

## Quick start

```python
from polite_http import HttpClient

# Scope a client to a base URL and a steady-state rate of 3 requests/second.
client = HttpClient("https://eutils.ncbi.nlm.nih.gov/entrez/eutils/", qps=3)

# GET + parse JSON (relative paths are resolved against the base URL).
data = client.fetch_json("esummary.fcgi?db=pubmed&id=123456")

# POST a JSON body.
result = client.fetch_json(
    "esearch.fcgi",
    method="POST",
    json_body={"db": "pubmed", "term": "cancer"},
)

# Download raw bytes with a custom timeout.
pdf = client.fetch_bytes(
    "efetch.fcgi?db=pubmed&id=123456&rettype=abstract",
    timeout=60,
)
```

## Streaming

```python
# Stream a large response line-by-line without buffering it all in memory.
for line in client.stream_lines("large-export.tsv"):
    process(line)

# Stream binary content in chunks (e.g. to a file).
with open("paper.pdf", "wb") as f:
    for chunk in client.stream_bytes("paper.pdf"):
        f.write(chunk)
```

## Configuration

`HttpClient` accepts the following keyword arguments:

| Argument | Default | Description |
| --- | --- | --- |
| `qps` | _required_ | Maximum queries per second (steady state). |
| `default_headers` | `None` | Headers added to every request. |
| `max_retries` | `7` | Retry attempts for transient errors (total attempts = `max_retries + 1`). |
| `timeout` | `60.0` | Per-request timeout in seconds. |
| `backoff_base` | `3.0` | Base delay for exponential backoff. |
| `backoff_max` | `180.0` | Cap on backoff delay. |
| `jitter` | `0.5` | Max uniform random jitter added to each backoff. |
| `user_agent` | env / `""` | `User-Agent` header; falls back to `POLITE_HTTP_USER_AGENT`. |
| `retryable_status_codes` | `{429, 500, 502, 503, 504}` | Status codes that trigger a retry. |
| `referer` | `None` | Optional `Referer` header sent with every request. |

### Environment variables

- `POLITE_HTTP_USER_AGENT` — default `User-Agent` when one isn't passed
  explicitly. Many APIs (e.g. NCBI) reject requests without a descriptive
  User-Agent, so setting this is recommended.
- `POLITE_HTTP_LOCK_DIR` — directory for the cross-process rate-limit lock
  files (defaults to the system temp directory).

## Error handling

Failed requests raise `HttpError`, which carries the `status_code`, raw `body`
bytes, and `url`:

```python
from polite_http import HttpClient, HttpError

client = HttpClient("https://api.example.com/", qps=5)
try:
    data = client.fetch_json("widgets/42")
except HttpError as exc:
    print(exc.status_code, exc.url)
    if exc.body:
        print(exc.json())  # parse the error body as JSON, if applicable
```

## Acknowledgements

The HTTP client at the heart of this package is derived from the
[`science-skills`](https://github.com/google-deepmind/science-skills) project by
Google DeepMind, used under the Apache License 2.0. See [`NOTICE`](NOTICE) for
details of the changes.

## License

[Apache License 2.0](LICENSE).
