Metadata-Version: 2.4
Name: homelab-guardian
Version: 0.3.2
Summary: Local-first homelab health monitoring with optional AI-agent integration and approval-gated repair
Author: spezzuti
License: AGPL-3.0-or-later
Project-URL: Homepage, https://github.com/spezzuti/homelab-guardian
Project-URL: Repository, https://github.com/spezzuti/homelab-guardian
Project-URL: Changelog, https://github.com/spezzuti/homelab-guardian/blob/main/CHANGELOG.md
Keywords: homelab,monitoring,health-check,self-hosted,mcp,docker,systemd
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: GNU Affero General Public License v3 or later (AGPLv3+)
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Topic :: System :: Monitoring
Classifier: Topic :: System :: Systems Administration
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: PyYAML>=6.0.1
Requires-Dist: requests>=2.31.0
Requires-Dist: docker>=7.0.0
Provides-Extra: dev
Requires-Dist: pytest>=8.0.0; extra == "dev"
Provides-Extra: mcp
Requires-Dist: mcp>=1.2.0; extra == "mcp"
Dynamic: license-file

# Homelab Guardian

Homelab Guardian is a local-first homelab operations assistant built around
read-only infrastructure collectors and local reports.

It is not another dashboard. It generates plain-English health reports that explain:

- what is broken
- what changed
- what matters
- what the safest next step is

Guardian v0.2 is the **Daily Homelab Doctor plus dashboard and alerts**: a
packaged CLI that collects optional read-only signals, stores local snapshots,
writes Markdown reports, serves a read-only web view, and can send optional
flap-damped notifications.

## Core principles

- Local-first
- Read-only against homelab infrastructure by default
- Any action is opt-in, whitelisted, reversible-minded, and **human-approved** —
  Guardian proposes; a person approves; Guardian verifies. (See *Approval-gated
  repair*.)
- **Never raw shell, never an AI-generated command** — actions are named,
  parameterized argv only; the model is never the authority
- No cloud dependency required
- Useful without AI
- Secrets stay local
- Every integration is optional
- Collectors degrade gracefully when unavailable or unconfigured

Guardian does write its own local runtime state: Markdown reports, SQLite scan
snapshots, acknowledgments, alert state, and retention cleanup. Optional
integrations may send outbound requests for Telegram notifications or AI
briefings when explicitly enabled.

## Quick start

```bash
git clone <repo-url>
cd homelab-guardian
python -m venv .venv
. .venv/bin/activate
pip install -e .

guardian init      # answer a few questions; optionally scans your LAN
guardian doctor    # preflight check
guardian           # first scan -> reports/latest.md
```

`guardian init` can probe your local network (read-only TCP connects, nothing
is sent to any device) and recognizes common homelab services — Home
Assistant, Proxmox, Pi-hole/AdGuard, Portainer, Plex, Jellyfin, Synology,
QNAP, Uptime Kuma, and more — then writes a working `config.yaml` for you.
Smart-speaker false positives (Google Cast devices) are fingerprinted and
filtered out automatically.

## Manual first-run steps

```bash
cp config.example.yaml config.yaml
mkdir -p data reports
```

Edit `config.yaml` locally. Do not commit it.

For Docker inventory, enable the Docker collector in `config.yaml` only on a Docker host or when using the socket proxy overlay:

```yaml
collectors:
  docker:
    enabled: true
    socket_url: unix://var/run/docker.sock
    exclude_containers:
      - "homelab-guardian*"
```

## Direct Python run

Use this mode for development or for hosts where Python already has access to the paths and services you want to inspect.

```bash
python -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
python -m homelab_guardian.main --config config.yaml
```

Safe example run without private services:

```bash
python -m homelab_guardian.main --config config.example.yaml
```

Preflight check:

```bash
python -m homelab_guardian.main doctor --config config.yaml
```

Equivalent form:

```bash
python -m homelab_guardian.main --config config.yaml --doctor
```

## Prebuilt Docker image

Multi-arch images (amd64, arm64 — Raspberry Pi friendly) are published to
GitHub Container Registry on every main-branch push:

```bash
docker pull ghcr.io/spezzuti/homelab-guardian:latest
```

Notes for containerized runs:

- The systemd collector needs the host's systemd; run Guardian directly on
  the host if you want service monitoring.
- The Bitwarden secrets provider needs the `bws` CLI, which is not in the
  image; in containers, inject secrets as environment variables instead
  (env_file or `bws run -- docker compose ...`).

## Docker Compose run

Preferred install path on a Docker host:

```bash
cp config.example.yaml config.yaml
mkdir -p data reports
docker compose run --rm homelab-guardian
```

The default Compose file mounts:

- `./config.yaml:/app/config.yaml:ro`
- `./data:/app/data`
- `./reports:/app/reports`
- `/var/run/docker.sock:/var/run/docker.sock:ro`

Guardian writes:

- report: `./reports/latest.md`
- SQLite snapshots: `./data/guardian.sqlite`

Inspect the latest report:

```bash
sed -n '1,220p' reports/latest.md
```

Or open `reports/latest.md` in your editor.

## Docker socket warning

Mounting `/var/run/docker.sock` matters because the Docker collector must ask the Docker daemon for container metadata: status, health, restart count, ports, mounts, volumes, and Compose labels.

The Docker socket is powerful. Even when mounted `:ro`, the Docker API can expose sensitive host/container metadata, and socket access is often equivalent to broad control of Docker. Guardian only performs read-oriented SDK calls, but the socket itself should still be treated as privileged.

If `/var/run/docker.sock` is missing:

- You are probably not on a Docker host, or
- Guardian is running in a container without the socket mounted, or
- Docker Desktop / rootless Docker uses a different socket path.

Safest next step:

1. Run `python -m homelab_guardian.main doctor --config config.yaml`.
2. Confirm the host actually runs Docker.
3. If running in Docker Compose, confirm the socket mount exists.
4. If you do not want Docker inventory on this machine, disable `collectors.docker.enabled`.

## Safer socket proxy mode

A safer alternative to direct socket mounting is the optional socket proxy Compose file:

```bash
docker compose -f docker-compose.socket-proxy.yml run --rm homelab-guardian
```

This starts `docker-socket-proxy` and sets:

```text
DOCKER_HOST=tcp://docker-socket-proxy:2375
```

The proxy exposes only selected read-oriented Docker API areas where possible and keeps write methods disabled. This reduces blast radius compared with mounting the raw socket directly into Guardian. It is still Docker daemon access, so use it intentionally.

## Configuration

Start from `config.example.yaml`. Do not commit `config.yaml`, `.env`, API tokens, SSH keys, databases, generated reports, or machine-specific credentials.

Home Assistant access is read-only and uses an environment variable for the token. The example Compose files load local `.env` values into the container and pass `HOMEASSISTANT_TOKEN` through to Guardian.

```bash
cp .env.example .env
# Edit .env locally. Never commit it.
```

## Deployment modes

### Run directly on a Docker host

Install Python and run Guardian on the same host that runs Docker. Enable the Docker collector only if `/var/run/docker.sock` exists and the user running Guardian can read Docker metadata.

### Run via Docker Compose with Docker socket mounted

Run Guardian as a one-shot container with local `config.yaml`, `data`, and `reports` bind mounts. This is the preferred MVP install path for Docker hosts.

### Run via Docker Compose with socket proxy

Use `docker-compose.socket-proxy.yml` to route Docker SDK calls through `docker-socket-proxy` instead of giving Guardian the raw socket.

### Run without Docker

Guardian is still useful without Docker. Leave the Docker collector disabled and use any combination of:

- DNS checks
- TCP checks
- HTTP checks
- local backup path checks
- Home Assistant API checks

### Future: remote collectors

Future versions may support remote collectors for Docker hosts, NAS systems, Home Assistant, and backup locations. The current MVP is local-only: paths and sockets are evaluated from the machine or container running Guardian.

## Current collectors

### Docker collector

Disabled by default because Docker socket access is sensitive. When enabled, it reads container metadata and reports:

- container name
- image
- status
- health status
- restart count
- exposed/published ports
- mounts, bind paths, and named volumes
- Docker Compose project/service labels

Exited, unhealthy, restarting, or dead containers are surfaced as warnings or critical checks. If Docker is enabled but unavailable, the report shows `unknown` with the likely cause and safest next step instead of crashing.

Guardian can exclude containers by name pattern. This is useful for ignoring Guardian's own one-shot runtime containers and its socket proxy:

```yaml
collectors:
  docker:
    exclude_containers:
      - "homelab-guardian*"
```

Docker Compose container names in this setup normally use hyphens, not underscores, so prefer `homelab-guardian*` for Guardian runtime exclusions. Excluded containers are skipped from normal container health checks. The Docker inventory summary still reports how many were excluded and which patterns were used.

### Home Assistant collector

Disabled by default. When configured with a URL and token environment variable, it performs a read-only `GET /api/states` request and reports unavailable or unknown entities. It does not call services and does not modify Home Assistant.

Safe setup for local dogfood:

1. In Home Assistant, create a long-lived access token from your user profile.
2. Copy `.env.example` to `.env` and put the token there:

   ```env
   HOMEASSISTANT_TOKEN=your-token-here
   ```

3. In the ignored local `config.yaml`, set the Home Assistant URL and enable the collector:

   ```yaml
   collectors:
     homeassistant:
       enabled: true
       url: "http://homeassistant.local:8123"
       token_env: "HOMEASSISTANT_TOKEN"
   ```

4. Run a report:

   ```bash
   python -m homelab_guardian.main --config config.yaml
   ```

5. If running through Docker Compose, use the same ignored `.env` file and `config.yaml`:

   ```bash
   docker compose run --rm homelab-guardian
   ```

Never commit `.env`, `config.yaml`, reports, databases, tokens, or machine-specific credentials.

### Network collector

Supports:

- DNS resolution checks
- TCP port checks
- HTTP status checks
- TLS certificate expiry checks (works for self-signed certificates too)

Failures include clear evidence such as hostname, port, expected status, actual status, timeout, and error text.

### Backup freshness collector

Checks configured local paths without modifying them. It reports:

- whether the path exists
- latest modified file timestamp
- backup age in hours and days
- warning if the newest file is older than `max_age_days`
- critical if a required path is missing
- critical if a required directory exists but contains no files
- unknown if an optional directory exists but contains no files
- unknown if backup checks are enabled but no paths are configured yet

If `backups.enabled` is true and `paths: []`, Guardian reports `unknown` because the check is not ready to evaluate anything. That means configuration is incomplete, not that a backup failed. Add backup paths when ready, or set `backups.enabled: false` until backup monitoring is part of your rollout.

Backup paths are local to the machine or container running Guardian. If Guardian runs in Docker, mount backup locations read-only into the container first.

When the configured path is a file, Guardian uses that file's modified time. When the configured path is a directory, Guardian recursively scans files inside the directory and uses the newest file modified time. Directory modified times are ignored because they can change for reasons that do not prove a backup file is fresh.

### Safe backup freshness dogfood

Use a dummy local folder before pointing Guardian at real backup destinations. Do not test against production backup paths until the dummy procedure behaves as expected.

```bash
mkdir -p /tmp/homelab-guardian-backup-dogfood
printf 'dummy backup marker\n' > /tmp/homelab-guardian-backup-dogfood/backup-marker.txt
cp config.example.yaml config.yaml
```

In the ignored local `config.yaml`, set only the dummy path:

```yaml
collectors:
  backups:
    enabled: true
    paths:
      - id: dummy_backup_dogfood
        name: Dummy backup dogfood path
        path: /tmp/homelab-guardian-backup-dogfood
        max_age_days: 1
        required: true
```

Then run:

```bash
python -m homelab_guardian.main --config config.yaml
```

Expected result: the dummy backup check reports `ok` while the marker file is fresh. To test stale behavior safely, change `max_age_days` to `0` or adjust only files inside `/tmp/homelab-guardian-backup-dogfood`. Never commit `config.yaml`, generated reports, database files, or the dummy runtime folder.

## Web view

```bash
guardian serve                          # http://localhost:8674
guardian serve --interval 900           # appliance mode: scan + serve in one process
guardian serve --host 0.0.0.0           # expose on your LAN (explicit choice)
```

A read-only page rendered from local scan history: overall status, the AI
briefing when enabled, what changed, every check with its evidence, and
recent scan history with per-scan drill-down. No web framework, no
JavaScript, no write endpoints; binds to localhost unless you say otherwise.
`/healthz` returns plain `ok` so other monitors can watch Guardian itself.

## Recurring scans

Guardian runs once by default. For continuous monitoring, pass `--interval`:

```bash
python -m homelab_guardian.main --config config.yaml --interval 900
```

This repeats the scan every 900 seconds. A failed scan is logged and the loop
continues. Each scan is compared against the previous snapshot, so the report
and any notifications highlight what changed.

For a host install, `deploy/homelab-guardian.service` is a ready-to-edit
systemd user service. For Docker Compose, run the service with `--interval`
and `restart: unless-stopped` instead of one-shot `docker compose run`.

## Secrets providers

Every credential Guardian uses (Home Assistant token, Telegram bot token, AI
API key) is referenced by name in `config.yaml` and resolved through a secrets
provider. Tokens never live in the config file.

- `provider: env` (default) — names are environment variables, typically from
  a local `.env` file. No extra tooling required.
- `provider: bitwarden` — names are secret keys in
  [Bitwarden Secrets Manager](https://bitwarden.com/products/secrets-manager/),
  fetched through the `bws` CLI with a single machine-account access token
  (`BWS_ACCESS_TOKEN` in the environment). One token instead of a pile of
  `.env` entries, secrets stay centrally managed and rotatable, and
  environment variables still override when set. If the provider is
  unavailable, Guardian warns once and degrades to environment-only — a
  secrets backend outage never breaks a scan.

```yaml
secrets:
  provider: bitwarden
  bitwarden:
    access_token_env: "BWS_ACCESS_TOKEN"
    project_id: "" # optional: restrict to one project
```

`python -m homelab_guardian.main doctor` verifies the provider end-to-end and reports how
many secrets are readable. The provider interface is intentionally small, so
additional backends (Vault, 1Password, Infisical) can be added without
touching collectors.

Alternative: `bws run -- python -m homelab_guardian.main --config config.yaml` injects all
secrets as process environment variables without any Guardian configuration.

## AI briefing — bring your own model

Optional and disabled by default. When `ai.enabled` is true, Guardian sends
only the structured check results and the what-changed diff to a single
OpenAI-compatible chat completions endpoint and places the returned
plain-English briefing at the top of the report.

- Works with OpenRouter, a local Ollama/LM Studio endpoint, or any other
  OpenAI-compatible server — your model, your key, your choice.
- The model receives structured JSON only. It has no shell, no tools, and no
  access to your systems, and the prompt forbids suggesting state-changing
  commands.
- Guardian remains fully functional with this disabled; a failed model call
  never fails the scan.

```yaml
ai:
  enabled: true
  base_url: "http://localhost:11434/v1" # local Ollama example
  model: "qwen3:14b"
  api_key_env: "GUARDIAN_AI_API_KEY" # leave the env var unset for keyless local endpoints
```

Note: each config file should point at its own `database_path`. Scan diffing
compares against the previous snapshot in that database, so two configs
sharing one database will see each other's checks as added/removed noise.

## Attach an AI agent (MCP + agent-mode)

Guardian's collectors and the `status/summary/evidence/recommended_action`
contract are the moat; you can hand that view to any model.

- **`guardian mcp`** serves Guardian over the [Model Context Protocol](https://modelcontextprotocol.io)
  so an agent (Claude, a local agent, ...) reads your *verified* homelab state
  instead of re-deriving it. Read-only by default; optional gated write tools.
  See [docs/mcp.md](docs/mcp.md).
- **Agent-delivery mode** (`notifications.mode: agent`) makes Guardian feed each
  confirmed change to the agent's webhook so the **agent is the single voice**,
  with a deterministic Telegram fallback for criticals if the agent is
  unreachable.

The division of labor: Guardian is the deterministic source of truth and the
"reflex" actuator; the agent narrates, reasons, and handles the deep, judgment-
heavy fixes Guardian deliberately won't.

## Approval-gated repair

Optional, off by default (`repair.enabled`). Guardian can *propose* a fix for a
detected problem and, **after a human approves it**, *execute* it and verify
recovery — closing the detect → diagnose → propose → approve → repair → verify
loop. The whole point is to do this safely:

- **Never raw shell.** Only named, whitelisted, parameterized actions, built as
  argv lists. Targets come from validated check evidence or admin allowlists.
- **The agent is never the authority.** It can propose and execute, but approval
  is human-only (CLI `guardian repair approve`, or the dashboard `/repairs`
  page). Destructive actions can never auto-approve.
- Built-ins: restart a watched systemd unit or container; reclaim disk
  (`docker_prune` / `journal_vacuum` / `apt_clean` / `prune_dir`, with read-only
  previews and a backup interlock). Everything is audited and loop-guarded.

Design and threat model: [docs/repair.md](docs/repair.md) and
[docs/repair-reclaim.md](docs/repair-reclaim.md).

## Telegram notifications

Optional and disabled by default. Configure under `notifications.telegram` in
`config.yaml` with a bot token and chat id provided through environment
variables (see `config.example.yaml`). `send_on: changes` is the recommended
mode: you only get a message when something actually changed since the last
scan.

### Disk space collector

Reads usage on configured mounts (or the drive Guardian runs on when no
paths are set) with warning/critical percent thresholds. Disk-full is the
most common silent homelab failure; this is the check that catches it early.

### systemd collector

Sweeps the system (and optionally user) service manager for failed units and
units stuck in a restart loop — the `activating/auto-restart` state that
never reaches "failed" and hides exactly the breakage that matters. Specific
units can be watched individually with state and restart-count evidence.

## Flap damping

One wifi blip should not page you. With `notifications.telegram.confirm_scans: 2`,
a status change must hold for two consecutive scans before Guardian announces
it — in both directions, so recoveries are confirmed too. A check that flaps
up and down never triggers a message. The report and web view always show the
live state; damping only gates notifications.

## Acknowledging known issues

Chronic problems train you to ignore alerts. Acknowledge a check to mute it
without losing sight of it:

```bash
guardian ack ha_unavailable_entities --note "MQTT bridge down, part ordered" --days 14
guardian ack          # list current acknowledgments
guardian unack ha_unavailable_entities
```

An acknowledged check keeps its real status but is excluded from the overall
status, change detection, and notifications. It appears in a collapsed
"Acknowledged" section of the report and web view, with your note, so it
stays visible without drowning the signal. Acknowledgments can expire
automatically (`--days`, `--until`); check ids are shown in reports and in
the web view's evidence blocks.

## Report layout

The Markdown report includes:

- overall status
- summary counts
- what changed since the previous scan (regressions, improvements, new and removed checks)
- critical issues first
- warnings second
- unknowns third
- OK checks last, collapsed to names when there are many
- recommended actions and JSON evidence for each non-collapsed check

## Safety notes

Homelab Guardian's safety boundary has four parts:

- **Collectors are read-only.** Detection never modifies services, containers,
  DNS, Home Assistant entities, backup contents, systemd units, certificates,
  disks, or remote hosts.
- **Local runtime state is writable.** Guardian writes reports, SQLite scan
  snapshots, acknowledgments, alert state, and optional retention pruning under
  the configured report/database paths.
- **Outbound integrations are opt-in.** Telegram notifications and AI briefings
  send structured status data only when explicitly enabled.
- **Repair is opt-in and human-gated.** With `repair.enabled` (off by default),
  Guardian may *propose* a whitelisted, parameterized fix and execute it **only
  after a human approves the specific proposal** — never raw shell, never an
  AI-generated command, always followed by a verify and an audit record.
  Destructive actions can never auto-approve. See *Approval-gated repair* and
  [docs/repair.md](docs/repair.md).

Recommended actions in reports are diagnostic next steps for the operator;
nothing executes automatically without the approval flow above.
