Metadata-Version: 2.4
Name: agamenox-ratelimiter
Version: 0.1.0
Summary: Multi-provider rate limiter for LLM API pipelines
Author-email: "Agamenon (Eli Godoy)" <eligodoyruiz@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/Agamenox/ratelimiter
Project-URL: Issues, https://github.com/Agamenox/ratelimiter/issues
Project-URL: Changelog, https://github.com/Agamenox/ratelimiter/blob/main/CHANGELOG.md
Keywords: rate-limit,llm,openai,anthropic,openrouter,429,throttle
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pyyaml>=6.0
Provides-Extra: dev
Requires-Dist: pyyaml>=6.0; extra == "dev"
Requires-Dist: pytest>=7.0; extra == "dev"
Dynamic: license-file

# ratelimiter

[![CI](https://github.com/Agamenox/ratelimiter/actions/workflows/ci.yml/badge.svg)](https://github.com/Agamenox/ratelimiter/actions/workflows/ci.yml)
[![CodeQL](https://github.com/Agamenox/ratelimiter/actions/workflows/codeql.yml/badge.svg)](https://github.com/Agamenox/ratelimiter/actions/workflows/codeql.yml)
[![Python 3.9+](https://img.shields.io/badge/python-3.9%2B-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

Multi-provider rate limiter for LLM API pipelines. Drop-in module that keeps you out of `429 Too Many Requests` trouble across all the providers you use.

* **Per (provider, model) limits** — RPM, TPM, RPD with sliding window
* **Auto 429 backoff** — exponential with jitter, reads `Retry-After` when present
* **Thread-safe + async-safe** — same `RateLimiter` for both worlds
* **YAML config** — `plans.yaml` with glob wildcards, easy to edit
* **Zero hard dependencies** — only `pyyaml`
* **Empirically tested** — 300 RPM on tokenrouter/MiniMax-M3 verified via burst test

## Install

The PyPI package is published as **`agamenox-ratelimiter`** (the plain `ratelimiter` name is taken on PyPI since 2013). The Python import stays as `ratelimiter` — different names, same module.

```bash
pip install agamenox-ratelimiter
# or, from source:
git clone https://github.com/Agamenox/ratelimiter.git
cd ratelimiter
pip install -e .
```

## Quick start

```python
from ratelimiter import RateLimiter, call_with_retry

limiter = RateLimiter.from_yaml("plans.yaml")

# Wrap any function — auto 429 retry
def call_m3(prompt: str) -> str:
    return call_with_retry(
        limiter, "tokenrouter", "MiniMax-M3",
        api_call_fn, prompt,
        max_retries=5, base_backoff=1.0, max_backoff=60.0,
    )

# Or acquire manually
limiter.acquire("tokenrouter", "MiniMax-M3")
response = api_call(...)

# Monitor usage
print(limiter.status("tokenrouter", "MiniMax-M3"))
# {'rpm_used': 5, 'rpm_limit': 300, 'tpm_used': 0, ...}
```

Async:

```python
ok = await limiter.acquire_async("openrouter", "minimax/MiniMax-M2.5-highspeed")
```

## Why this exists

I was running a batch pipeline through tokenrouter's free `MiniMax-M3` model
and getting mysterious 429s. After a sustained-rate test I learned the limit
was **300 requests/minute, no rate limit headers, sliding 60s window** — and
that the error only told me anything after I hit the wall.

So I built a small limiter, measured more providers, and packaged it up. Now
my pipelines throttle themselves *before* the wall, and when they do hit it
they back off cleanly with exponential jitter.

The key insight: **different providers have different limits, different
header conventions, and different retry semantics. Hardcoding any of it is
a maintenance trap.** Hence the YAML registry — you measure once, write it
down, and the limiter does the right thing for every provider.

## Plans (registry format)

```yaml
tokenrouter:
  MiniMax-M3:
    rpm: 300
    tier: free
    notes: "Empirically 300 RPM, no headers, sliding window."

openrouter:
  "*:free":
    rpm: 20
    rpd: 200
    tier: free
```

Fields: `rpm` (required), `tpm`, `rpd`, `burst`, `tier` (`free|paid|enterprise|local`), `notes`. Wildcards (`*`, `?`) supported in the model field.

## Detected limits (2026-06-14)

| Provider / Model | RPM | TPM | RPD | Source |
|---|---|---|---|---|
| `tokenrouter/MiniMax-M3` (free) | **300** | — | — | [Empirical burst test](references/tokenrouter.md) |
| `openrouter/*:free` | 20 | — | 200 | OpenRouter docs |
| `nvidia/*` (NIM) | 40 | 800K | — | Conservative default |
| `zai/glm-5-turbo` | 10 | 500K | — | User report |
| `minimax/M2.5-highspeed` | 60 | 1M | — | Conservative |
| `opencode-go/*` | 60 | 500K | — | Conservative |
| `lmstudio/*` | ∞ | — | — | Local |

**Naming note:** PyPI package is `agamenox-ratelimiter`; Python import is `ratelimiter` (the module directory name). Use `pip install agamenox-ratelimiter` to install; `from ratelimiter import ...` to use.

If you've measured a different limit, [open a rate-limit data issue](.github/ISSUE_TEMPLATE/rate_limit_data.md) so the registry stays honest.

## API

```python
lim = RateLimiter.from_yaml("plans.yaml")      # or from_dict({...})

# Sync
ok = lim.acquire(provider, model, estimated_tokens=0, timeout=300)
lim.release(provider, model, estimated_tokens=0)        # refund a slot
status = lim.status(provider, model)                    # snapshot dict

# Async
ok = await lim.acquire_async(provider, model, estimated_tokens=0)

# Auto-429 wrapper
result = call_with_retry(lim, provider, model, fn, *args,
                         max_retries=5, base_backoff=1.0, max_backoff=60.0,
                         estimated_tokens_fn=None)
```

Detects 429s from `urllib`, `requests`, `httpx`, and any object with
`.status_code == 429`. Reads `Retry-After` and `X-RateLimit-*` headers when
present.

## Documentation

- [API reference](references/api.md) — full method signatures, algorithm notes
- [Tokenrouter specifics](references/tokenrouter.md) — empirical test methodology
- [Contributing](CONTRIBUTING.md)
- [Changelog](CHANGELOG.md)

## Tests

```bash
python tests/test_limiter.py        # 20 unit tests, < 0.5s, no network
python examples/integration_test.py # 5 real API calls to tokenrouter
```

The unit tests use a `FakeClock` so the time-dependent ones run in
milliseconds. The integration test requires `pyyaml` and a tokenrouter API key
(loaded from `F:\dev\ratelimiter\examples\integration_test.py` config).

## CI

GitHub Actions runs on every push and PR:

- **Unit tests** on Python 3.9–3.13, Ubuntu + Windows + macOS
- **Lint + syntax check** on every `.py` file
- **YAML round-trip** — verify `plans.yaml` is loadable
- **CodeQL** — security analysis, weekly schedule
- **Publish to PyPI** — auto-triggered on GitHub release (trusted publishing)

## Roadmap

- [ ] Per-key / per-project quotas (multi-tenant)
- [ ] Prometheus metrics export
- [ ] Redis backend for distributed pipelines
- [ ] Async context manager: `async with limiter.guard(...) as ok:`

## License

MIT — see [LICENSE](LICENSE).
