Metadata-Version: 2.4
Name: argus-dpy
Version: 0.4.2
Summary: Operational Prometheus/OpenTelemetry metrics for discord.py bots, in one line.
License-Expression: AGPL-3.0-or-later
License-File: LICENSE
Keywords: discord,discord.py,metrics,observability,opentelemetry,prometheus
Classifier: Development Status :: 3 - Alpha
Classifier: Framework :: AsyncIO
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU Affero General Public License v3 or later (AGPLv3+)
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: System :: Monitoring
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: aiohttp<4,>=3.11
Requires-Dist: discord-py<3,>=2.4
Requires-Dist: prometheus-client>=0.20
Provides-Extra: clickhouse
Requires-Dist: clickhouse-connect[async]; extra == 'clickhouse'
Provides-Extra: dotenv
Requires-Dist: python-dotenv>=1.0; extra == 'dotenv'
Provides-Extra: fleet
Requires-Dist: python-dotenv>=1.0; extra == 'fleet'
Provides-Extra: otlp
Requires-Dist: opentelemetry-exporter-otlp-proto-grpc; extra == 'otlp'
Requires-Dist: opentelemetry-sdk; extra == 'otlp'
Description-Content-Type: text/markdown

# argus-dpy

[![CI](https://github.com/AstorisTheBrave/argus/actions/workflows/ci.yml/badge.svg)](https://github.com/AstorisTheBrave/argus/actions/workflows/ci.yml)
[![PyPI](https://img.shields.io/pypi/v/argus-dpy)](https://pypi.org/project/argus-dpy/)
[![Python](https://img.shields.io/pypi/pyversions/argus-dpy)](https://pypi.org/project/argus-dpy/)
[![License: AGPL-3.0-or-later](https://img.shields.io/badge/license-AGPL--3.0--or--later-blue)](LICENSE)

**Operational Prometheus / OpenTelemetry metrics for [discord.py](https://github.com/Rapptz/discord.py) bots, in one line.**

```python
import discord
from discord.ext import commands
from argus import Argus

bot = commands.AutoShardedBot(command_prefix="!", intents=discord.Intents.default())
Argus(bot)          # the whole integration
```

`Argus(bot)` instruments shard latency, interaction/command throughput and
outcomes, precise command duration, gateway throughput, rate-limit pressure and
cache sizes, then serves a Prometheus `/metrics` endpoint **and a live web
dashboard** on the bot's own event loop. It can also push to OpenTelemetry and
drain per-guild events to ClickHouse. It never puts a guild, user, or channel id
on a Prometheus label.

## Install

```bash
pip install argus-dpy
```

Python 3.10+, `discord.py >= 2.4`. Optional extras: `argus-dpy[otlp]`
(OpenTelemetry push), `argus-dpy[clickhouse]` (per-guild analytics),
`argus-dpy[fleet]` (`.env` autoload for the control plane). A reference container is published at
`ghcr.io/astoristhebrave/argus`, and the [Fleet control plane](#fleet-control-plane-opt-in)
at `ghcr.io/astoristhebrave/argus-fleet`.

**Compatibility.** Argus targets upstream **discord.py 2.x** and uses its
asynchronous cog lifecycle (`await bot.add_cog`, async `cog_load`/`cog_unload`)
and `setup_hook` chaining. Forks that vendor the `discord` namespace and follow
the same async-cog semantics may work but are untested; Pycord differs (a
synchronous `add_cog` and a non-coroutine `cog_unload`) and is not supported
unmodified. Because every fork ships the same `discord` import name, only one can
be installed at a time, and `pip install argus-dpy` pulls upstream discord.py.
See [Compatibility](https://github.com/AstorisTheBrave/argus/wiki/Compatibility).

**New here?** Follow a tutorial end to end:
[Single bot](https://github.com/AstorisTheBrave/argus/wiki/Tutorial-Single-Bot)
or [Fleet at scale](https://github.com/AstorisTheBrave/argus/wiki/Tutorial-Fleet).

## Behaviour

`Argus(bot)` registers listeners synchronously, then starts an aiohttp server on
the bot's loop once it is running. By default it serves the **dashboard at `/`**
and **metrics at `/metrics`** on port `9191`. Disable the dashboard with
`Argus(bot, dashboard=False)`; everything else is opt-in. Instrumentation is
fail-open everywhere: event hooks, scrape-time gauges, and even the metrics
server failing to bind are counted and swallowed, never raised into your bot. The
`argus_subsystem_up` gauge reports Argus' own health so you can alert when it
degrades while the bot stays up.
See [Architecture & invariants](https://github.com/AstorisTheBrave/argus/wiki/Architecture-and-Invariants).

## How Argus works

discord.py events flow through O(1), fail-open hooks into one backend-neutral
metric registry. Adapters and the HTTP server read from that registry; the core
never imports an adapter, so backends attach and detach without touching
collection. Gauges are read live at scrape time (no background poller). An
optional, separate analytical path drains per-guild events to ClickHouse and is
never a Prometheus label.

```mermaid
flowchart TD
    bot[discord.py bot] -->|events, state| hooks[core hooks and instrumentation]
    hooks -->|inc / observe / set_info| reg[(MetricRegistry, backend-neutral)]
    reg --> prom[Prometheus adapter]
    reg --> otlp[OTLP adapter, optional]
    hooks -.->|per-guild events| sink[history sink, optional]
    sink --> ch[(ClickHouse)]
    prom --> exp[aiohttp server]
    exp --> m[GET /metrics]
    exp --> dash[dashboard SPA and /api]
    prom -.->|snapshot| member[fleet client, optional]
    member -.->|register, heartbeat| fleet[Fleet control plane]
```

A bot opts into more by adding kwargs or `ARGUS_*` env vars; with none, only the
metrics endpoint and dashboard run. For many processes across regions, the
opt-in [Fleet control plane](#fleet-control-plane-opt-in) aggregates them into
one view.

## Minimal setup

The minimum is one line; everything else is opt-in via kwargs or `ARGUS_*`
environment variables (kwargs override env override defaults).

```python
Argus(bot)   # metrics at /metrics, dashboard at /, on port 9191
```

To protect the dashboard, set **one env var** on the host that runs the bot —
Argus picks it up automatically. The dashboard is served *by* Argus in the same
process, so there is nothing separate to host or wire up:

```bash
ARGUS_DASHBOARD_AUTH_TOKEN=your-secret   # gates / and /api/*; /metrics stays scrapeable
```

Open the dashboard once with the token and it is remembered in the browser:
`http://your-host:9191/?token=your-secret`.

### Common options

| kwarg / env | default | meaning |
|---|---|---|
| `port` / `ARGUS_PORT` | `9191` | server port (falls back to `SERVER_PORT`/`PORT` injected by Pterodactyl/PebbleHost/Railway) |
| `dashboard_auth_token` / `ARGUS_DASHBOARD_AUTH_TOKEN` | — | gate the dashboard + APIs |
| `metrics_auth_token` / `ARGUS_METRICS_AUTH_TOKEN` | — | require a bearer token to scrape `/metrics` (shared-host public binds) |
| `grafana_url` / `ARGUS_GRAFANA_URL` | — | link/embed your Grafana boards |
| `cluster_id` / `ARGUS_CLUSTER_ID` | `default` | label for clustered deploys |
| `enable_per_guild` / `ARGUS_ENABLE_PER_GUILD` | `false` | per-guild analytics path |
| `otlp_endpoint` / `ARGUS_OTLP_ENDPOINT` | — | also push metrics via OTLP |
| `log_format` / `ARGUS_LOG_FORMAT` | `text` | set `json` for structured logs on the `argus` logger |

Every option, precedence and parsing rule is in
[Configuration](https://github.com/AstorisTheBrave/argus/wiki/Configuration).
New here? Start with the [FAQ](https://github.com/AstorisTheBrave/argus/wiki/FAQ).

## Metrics

Aggregate, bounded-cardinality metrics: per-shard latency and up state,
per-cluster guild/user/voice/emoji/sticker/channel counts, uptime, registered
commands, interaction and command rates with success/error split, precise
app- and prefix-command duration histograms, gateway throughput, shard
dis/reconnects, log and rate-limit counters. Every counter and histogram carry a
`cluster` label. Argus also reports its own health: `argus_up`,
`argus_subsystem_up{subsystem}` (server/fleet/sink), and counters for swallowed
instrumentation errors and dropped analytical events.

Full list with labels: [Metrics Reference](https://github.com/AstorisTheBrave/argus/wiki/Metrics-Reference).

## Dashboard

A React SPA bundled into the wheel, served at `/`: overview, interactions,
gateway, your Grafana boards, and per-guild analytics. Reads metrics live over
SSE with a polling fallback. Set `dashboard_auth_token` for anything public.
See [Dashboard](https://github.com/AstorisTheBrave/argus/wiki/Dashboard).

## Per-guild analytics

Per-guild, per-user questions never go to Prometheus (cardinality). With
`enable_per_guild` + `clickhouse_dsn` (the `argus-dpy[clickhouse]` extra), Argus
drains per-guild events to ClickHouse (batched, non-blocking) and the dashboard's
Analytics section serves per-guild command counts and average durations.
Step-by-step: [Per-guild analytics tutorial](https://github.com/AstorisTheBrave/argus/wiki/Tutorial-Analytics);
internals: [History & ClickHouse](https://github.com/AstorisTheBrave/argus/wiki/History-and-ClickHouse).

## Grafana, OTLP, clustering

`docker compose up -d` brings up a provisioned Prometheus + Grafana with four
dashboards (overview, interactions, gateway, and an Argus self-health board) plus
recording and alerting rules you can tune. Set `otlp_endpoint` (the
`argus-dpy[otlp]` extra) to also push via
OpenTelemetry to Datadog, Grafana Cloud, Honeycomb, and the like. Run one Argus
per process with a distinct `cluster_id` for clustered bots.
See the [OTLP tutorial](https://github.com/AstorisTheBrave/argus/wiki/Tutorial-OTLP),
[Clustering](https://github.com/AstorisTheBrave/argus/wiki/Clustering), and
[OTLP internals](https://github.com/AstorisTheBrave/argus/wiki/OTLP).

**No inbound port? Push instead.** OTLP, a Prometheus **Pushgateway**
(`pushgateway_url`), and the Fleet client are all outbound-only, so they work
where you can't expose `/metrics` at all — Docker bot panels (Pterodactyl,
PebbleHost, Railway). See [hosting](examples/hosting/) /
[Hosting on bot panels](https://github.com/AstorisTheBrave/argus/wiki/Hosting).

## Fleet control plane (opt-in)

Running many bot processes across regions? The **Argus Fleet** control plane is a
separate, opt-in service that aggregates them into one readable, multi-tier view:
**Global** (everything) -> **Fleet** (a region, e.g. `asia`) -> **Cluster** (one
process) -> **Shard** (per-shard up/latency). It renders plain, colour-graded
panels with no PromQL or Grafana setup,
and reads from two interchangeable sources: a self-contained **push** path (zero
infra; members heartbeat to it) and an existing **Prometheus**.

Bots are unchanged unless they opt in. The fastest path is the setup wizard,
which mints a token and writes a ready `.env` + `docker-compose.fleet.yml` and
prints the exact member snippet:

```bash
python -m argus.fleet init        # scaffold; then: docker compose -f docker-compose.fleet.yml up -d
python -m argus.fleet doctor --url http://fleet-host:9190 --token secret   # diagnose
```

Or wire it by hand:

```bash
# the control plane (its own process / container)
ARGUS_FLEET_TOKEN=secret python -m argus.fleet          # serves :9190

# each bot opts in with a few env vars (or kwargs)
ARGUS_FLEET_URL=http://fleet-host:9190 \
ARGUS_FLEET_TOKEN=secret ARGUS_FLEET_GROUP=asia \
    python bot.py
```

Point it at the shared ClickHouse (`ARGUS_FLEET_CLICKHOUSE_DSN`) and the same pane
gains a per-guild **Analytics** view (fleet-wide, or sliced to one bot) — so one
dashboard covers operational rollups *and* analytics.

**Secure by default:** a non-loopback bind with no token refuses to start; set a
token (or `ARGUS_FLEET_TOKEN_FILE`). It assigns each process a stable per-region
number (never reused; a dead cluster keeps its slot, shown **down**), persists
topology across restarts, caps request bodies, strips its version banner, and
exposes its own `/metrics` and `/readyz`. The member side is fail-open: a fleet
outage never touches your bot loop. Full guide and deployment:
[Fleet](https://github.com/AstorisTheBrave/argus/wiki/Fleet) and the
[Fleet tutorial](https://github.com/AstorisTheBrave/argus/wiki/Tutorial-Fleet).

## Why no per-guild Prometheus labels?

`guild_id`/`user_id`/`channel_id` are unbounded; as labels they explode
Prometheus at scale and are useless to visualise. Argus forbids them by
construction and routes per-entity questions to the analytical path instead.

## Security

Set `dashboard_auth_token` for any non-localhost bot; the fleet control plane
refuses to start on a public bind without a token and is hardened by default
(rate limits, body caps, security headers, non-root images, SBOM/provenance). The
same security headers, body cap, and banner strip apply to the in-process bot
server too. The no-PII-label guarantee means per-entity data never reaches
Prometheus. CI runs CodeQL and a pip-audit dependency audit, and each release
ships a wheel SBOM. Full guidance:
[Security](https://github.com/AstorisTheBrave/argus/wiki/Security) and the
[threat model](THREAT_MODEL.md). Report vulnerabilities privately via
[SECURITY.md](SECURITY.md).

## Examples

Runnable examples in [`examples/`](examples/) (see [`examples/README.md`](examples/README.md)
for the index + a production dos-and-don'ts):

- [`basic_bot.py`](examples/basic_bot.py) — one bot, one line.
- [`production_bot.py`](examples/production_bot.py) — hardened single bot (intents, secrets, auth, logging).
- [`clustered_bot.py`](examples/clustered_bot.py) — one process per shard range.
- [`otlp_bot.py`](examples/otlp_bot.py) — export to an OpenTelemetry collector.
- [`analytics_bot.py`](examples/analytics_bot.py) — per-guild ClickHouse analytics.
- [`fleet_member_bot.py`](examples/fleet_member_bot.py) — opting into a fleet.
- [`config_kwargs.py`](examples/config_kwargs.py) — every option, as kwargs.
- [`k8s/`](examples/k8s/) — Kubernetes manifests for a bot and the control plane.
- [`hosting/`](examples/hosting/) — Docker bot panels (Pterodactyl, PebbleHost, Ori, Railway): egg, start shim, decision tree.

Using a coding agent to get started? Point it at [`llms.txt`](llms.txt) — a
machine-readable map (including how to clone the wiki for the in-depth guides).

## Contributing & license

Contributions are accepted under the DCO; see [CONTRIBUTING.md](CONTRIBUTING.md).
Licensed under **AGPL-3.0-or-later** (network use counts as distribution) — see
[LICENSE](LICENSE). Release notes: [CHANGELOG.md](CHANGELOG.md) /
[Releases](https://github.com/AstorisTheBrave/argus/releases).

---

**See the [full wiki](https://github.com/AstorisTheBrave/argus/wiki) for the in-depth guides and explanations.**
