Metadata-Version: 2.4
Name: asiai
Version: 1.12.0
Summary: Multi-engine LLM benchmark & monitoring CLI for Apple Silicon
Project-URL: Homepage, https://github.com/druide67/asiai
Project-URL: Documentation, https://asiai.dev
Project-URL: Repository, https://github.com/druide67/asiai
Project-URL: Issues, https://github.com/druide67/asiai/issues
Project-URL: Changelog, https://github.com/druide67/asiai/blob/main/CHANGELOG.md
Author-email: Jean-Marc Nahlovsky <druide67@free.fr>
License-Expression: Apache-2.0
License-File: LICENSE
License-File: NOTICE
Keywords: apple-silicon,benchmark,cli,inference,llm,lm-studio,mlx,monitoring,ollama
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: MacOS
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: System :: Monitoring
Requires-Python: >=3.11
Provides-Extra: all
Requires-Dist: fastapi>=0.115; extra == 'all'
Requires-Dist: jinja2>=3.1; extra == 'all'
Requires-Dist: mcp>=1.12; extra == 'all'
Requires-Dist: python-multipart>=0.0.9; extra == 'all'
Requires-Dist: textual>=0.80; extra == 'all'
Requires-Dist: uvicorn[standard]>=0.34; extra == 'all'
Provides-Extra: dev
Requires-Dist: httpx; extra == 'dev'
Requires-Dist: pytest; extra == 'dev'
Requires-Dist: pytest-asyncio; extra == 'dev'
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs-git-revision-date-localized-plugin>=1.2; extra == 'docs'
Requires-Dist: mkdocs-material; extra == 'docs'
Provides-Extra: mcp
Requires-Dist: mcp>=1.12; extra == 'mcp'
Provides-Extra: tui
Requires-Dist: textual>=0.80; extra == 'tui'
Provides-Extra: web
Requires-Dist: fastapi>=0.115; extra == 'web'
Requires-Dist: jinja2>=3.1; extra == 'web'
Requires-Dist: python-multipart>=0.0.9; extra == 'web'
Requires-Dist: uvicorn[standard]>=0.34; extra == 'web'
Description-Content-Type: text/markdown

<p align="center">
  <img src="assets/logo.svg" alt="asiai logo" width="140">
</p>

<h1 align="center">asiai</h1>

<p align="center">
  <strong>Apple Silicon AI</strong> — Multi-engine LLM benchmark & monitoring CLI
</p>

<p align="center">
  <a href="https://pypi.org/project/asiai/"><img src="https://img.shields.io/pypi/v/asiai.svg" alt="PyPI"></a>
  <a href="https://pypi.org/project/asiai/"><img src="https://img.shields.io/pypi/dm/asiai.svg?color=brightgreen" alt="Downloads"></a>
  <a href="https://github.com/druide67/asiai/actions/workflows/ci.yml"><img src="https://github.com/druide67/asiai/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
  <a href="https://codecov.io/gh/druide67/asiai"><img src="https://codecov.io/gh/druide67/asiai/branch/main/graph/badge.svg" alt="Coverage"></a>
  <a href="LICENSE"><img src="https://img.shields.io/badge/license-Apache%202.0-blue.svg" alt="License"></a>
  <a href="https://python.org"><img src="https://img.shields.io/badge/python-3.11%2B-blue.svg" alt="Python"></a>
  <a href="https://support.apple.com/en-us/116943"><img src="https://img.shields.io/badge/macOS-Apple%20Silicon-black.svg" alt="macOS"></a>
  <a href="https://github.com/sponsors/druide67"><img src="https://img.shields.io/badge/sponsor-%E2%9D%A4-pink.svg" alt="Sponsor"></a>
  <a href="https://api.asiai.dev/api/v1/badge/benchmarks"><img src="https://api.asiai.dev/api/v1/badge/benchmarks" alt="Benchmarks"></a>
  <a href="https://api.asiai.dev/api/v1/badge/top-speed"><img src="https://api.asiai.dev/api/v1/badge/top-speed" alt="Top Speed"></a>
  <a href="https://www.asiai.dev/agent/"><img src="https://api.asiai.dev/api/v1/agent-badge" alt="AI Agents"></a>
</p>

<p align="center">
  <img src="assets/asiai-demo.gif" alt="asiai bench demo" width="720">
</p>

**asiai** compares inference engines side-by-side on your Mac. Load the same model on Ollama and LM Studio, run `asiai bench`, get the numbers. No guessing, no vibes — just tok/s, TTFT, power efficiency, and stability per engine.

Share your results with the community (`--share`), compare against other Apple Silicon users (`asiai compare`), and get smart engine recommendations (`asiai recommend`).

Born from the OpenClaw project, where we needed hard data to pick the fastest engine for multi-agent swarms on Mac Mini M4 Pro.

## Quick start

```bash
pipx install asiai        # Recommended: isolated install
```

Or via Homebrew:

```bash
brew tap druide67/tap
brew install asiai
```

Other options:

```bash
uvx asiai detect           # Run without installing (requires uv)
pip install asiai           # Standard pip install
```

Then benchmark and share:

```bash
asiai bench --quick --card --share    # Bench + shareable card in ~15 seconds
```

## Commands

### `asiai detect`

Auto-detect running inference engines across 7 ports.

```
$ asiai detect

Detected engines:

  ● ollama 0.17.4
    URL: http://localhost:11434

  ● lmstudio 0.4.5
    URL: http://localhost:1234
    Running: 1 model(s)
      - qwen3.5-35b-a3b  MLX
```

### `asiai bench`

Cross-engine benchmark with standardized prompts. Runs 3 iterations per prompt by default, reports median tok/s (SPEC standard) with stability classification.

```
$ asiai bench -m qwen3.5 --runs 3 --power

  Mac Mini M4 Pro — Apple M4 Pro  RAM: 64.0 GB (42% used)  Pressure: normal

Benchmark: qwen3.5

  Engine       tok/s (±stddev)    Tokens   Duration     TTFT       VRAM    Thermal
  ────────── ───────────────── ───────── ────────── ──────── ────────── ──────────
  lmstudio    72.6 ± 0.0 (stable)   435    6.20s    0.28s        —    nominal
  ollama      30.4 ± 0.1 (stable)   448   15.28s    0.25s   26.0 GB   nominal

  Winner: lmstudio (2.4x faster)
  Power: lmstudio 13.2W (5.52 tok/s/W) — ollama 16.0W (1.89 tok/s/W)
```

Options:

```
-m, --model MODEL          Model to benchmark (default: auto-detect)
-e, --engines LIST         Filter engines (e.g. ollama,lmstudio,mlxlm)
-p, --prompts LIST         Prompt types: code, tool_call, reasoning, long_gen
-r, --runs N               Runs per prompt (default: 3, for median + stddev)
    --power                Cross-validate power with sudo powermetrics (IOReport always-on)
    --context-size SIZE    Context fill prompt: 4k, 16k, 32k, 64k
    --share                Share results with the community (anonymous, opt-in)
-Q, --quick                Quick benchmark: 1 prompt, 1 run (~15 seconds)
    --card                 Generate shareable benchmark card (SVG + PNG with --share)
-H, --history PERIOD       Show past benchmarks (e.g. 7d, 24h)
    --agentic-mode         Run the 8-run agentic prefix-cache-reuse protocol
    --agentic-output FILE  Save agentic-mode results as JSON
    --agentic-skip-long    Skip phases 7-8 (50K context) to save ~10 min
    --agentic-only LIST    Run only specified phases (cold,prefix-test-1,...)
    --code                 Dev-quality eval: tool-call, recovery, thinking, coding
    --code-suite LIST      tool-call[-stress],recovery,thinking[,coding[-hard]]
    --instruct             Instruction-following: IFEval-style verifiable + agentic deliverable
    --instruct-scenario L  verifiable,research-brief[,order-control]
    --language CODE        Multilingual retention eval (fr/de/es/it/pt/ja/ko/zh)
    --language-suite LIST  adherence,diacritics[,fluency] (default: deterministic 2)
    --judge-url URL        OpenAI-compat LLM judge for the 'coding'/'fluency' suites
```

### Agentic mode — measuring prefix cache reuse

```bash
asiai bench --agentic-mode --url http://localhost:8080 --model my-model \
    --agentic-output bench.json
```

Runs 8 sequential prompts with a fixed long system message and varying user
messages to expose how the engine reuses cached prefix tokens. Reads
`cached_tokens` from the streaming `usage` when the engine exposes it
(llama.cpp, mlx-lm), falls back to the TTFT ratio otherwise. Outputs a
verdict `prefix_cache_reuse: yes | partial | no`. The metric that matters
when your workload is multi-turn agentic with shared system prompts.

### Quality modes — measuring quality, not just speed

Throughput is not quality. Three deterministic modes (no LLM judge needed for the
core signal) measure whether a model is actually usable for real work:

```bash
# Dev quality: tool-call reliability (the JSON arg-truncation / empty-object bug),
# agentic error-recovery, thinking discipline — + an optional LLM-judged coding task.
asiai bench --code --url http://localhost:8080 --code-output code.json

# Instruction-following: IFEval-style verifiable instructions (format/length/
# keywords/case…) + an agentic task — does the model produce the primary
# multi-section deliverable AFTER a tool sequence, or only confirm the last step?
asiai bench --instruct --url http://localhost:8080 --instruct-output instruct.json

# Multilingual retention: did a finetune keep the base model's language?
# Adherence (stays in the language) + diacritics (café stays café), 8 languages.
asiai bench --language fr --url http://localhost:8080 --language-output lang.json
```

`--code` scores tool-call validity, the empty-object truncation bug, schema
conformance and error-recovery deterministically; add `--code-suite coding`
with `--judge-url <openai-compat-endpoint>` for an LLM-judged code-quality grade
(no SDK is bundled; the API key is read from the environment). `--instruct` runs
IFEval-style verifiable instructions (strict + loose, prompt- and
instruction-level) plus a tools-then-deliverable scenario that catches a finetune
doing the tool work but skipping the primary written output. `--language`
measures language adherence + orthography retention — the catastrophic-forgetting
signatures a task-specific finetune can introduce. All JSON-only and compare
across models by diffing the output. See
[Dev-quality benchmarks](docs/dev-quality-benchmarks.md).

Cross-model comparison — benchmark multiple models in one run and get a ranked summary:

```bash
# Cross-model comparison
asiai bench --compare qwen3.5:4b deepseek-r1:7b -e ollama --card
```

The runner resolves model names across engines automatically — `gemma2:9b` (Ollama) and `gemma-2-9b` (LM Studio) are matched as the same model.

### `asiai models`

List loaded models across all engines. Use `--json` for machine-readable output.

```
$ asiai models

ollama  http://localhost:11434
  ● qwen3.5:35b-a3b                             26.0 GB Q4_K_M

lmstudio  http://localhost:1234
  ● qwen3.5-35b-a3b                                 MLX
```

### `asiai monitor`

System and inference metrics snapshot, stored in SQLite. Use `--json` for machine-readable output.

```
$ asiai monitor

System
  Uptime:    3d 12h
  CPU Load:  2.45 / 3.12 / 2.89  (1m / 5m / 15m)
  Memory:    45.2 GB / 64.0 GB  71%
  Pressure:  normal
  Thermal:   nominal  (100%)

Inference  ollama 0.17.4
  Models loaded: 1  VRAM total: 26.0 GB

  Model                                        VRAM   Format  Quant
  ──────────────────────────────────────── ────────── ──────── ──────
  qwen3.5:35b-a3b                            26.0 GB     gguf Q4_K_M
```

Options:

```
-w, --watch SEC            Refresh every SEC seconds
-q, --quiet                Collect and store without output (for daemon use)
    --json                 Output as JSON (for scripting)
-H, --history PERIOD       Show history (e.g. 24h, 1h)
-a, --analyze HOURS        Comprehensive analysis with trends
-c, --compare TS TS        Compare two timestamps
    --alert-webhook URL    POST alerts on state transitions (memory, thermal, engine down)
```

### `asiai doctor`

Diagnose installation, engines, system health, and database.

```
$ asiai doctor

Doctor

  System
    ✓ Apple Silicon       Mac Mini M4 Pro — Apple M4 Pro
    ✓ RAM                 64 GB total, 42% used
    ✓ Memory pressure     normal
    ✓ Thermal             nominal (100%)

  Engine
    ✓ Ollama              v0.17.4 — 1 model(s): qwen3.5:35b-a3b
    ✓ LM Studio           v0.4.5 — 1 model(s): qwen3.5-35b-a3b
    ✗ mlx-lm              not installed
    ✗ llama.cpp            not installed
    ✗ vllm-mlx            not installed

  Database
    ✓ SQLite              2.4 MB, last entry: 1m ago

  5 ok, 0 warning(s), 3 failed
```

### `asiai versions`

Line up each engine's **running**, **installed**, and **available**
versions and flag what's behind — including the post-upgrade trap where a
live process predates the binary you just upgraded (`running-stale`).

```bash
asiai versions                   # offline: running/installed + brew outdated
asiai versions --check-upstream  # also query PyPI / GitHub (network, opt-in)
asiai versions --engine llamacpp # filter to one engine
asiai versions --json | jq
```

```
Engine versions

  ENGINE     RUNNING  INSTALLED  AVAILABLE  STATUS
  ─────────  ───────  ─────────  ─────────  ─────────────────
  llama.cpp  9370     9370       9380       upgrade-available
  Ollama     —        0.24.0     0.24.0     up-to-date

  1 upgrade(s) available
```

`asiai doctor` carries an offline recap of this, and `asiai web` exposes a
`/versions` page with changelog links. Triggering an upgrade is a write and
lives in `aisctl upgrade <engine>` (see
[docs/versions-mode.md](docs/versions-mode.md)).

### `asiai daemon`

Background monitoring via macOS launchd. Collects metrics every minute.

```bash
asiai daemon start              # Install and start the daemon
asiai daemon start --interval 30  # Custom interval (seconds)
asiai daemon status             # Check if running
asiai daemon logs               # View recent logs
asiai daemon stop               # Stop and uninstall
```

### `asiai web`

Web dashboard with real-time monitoring, benchmark controls, and interactive charts. Requires `pip install asiai[web]`.

```bash
asiai web                    # Opens browser at http://127.0.0.1:8899
asiai web --port 9000        # Custom port
asiai web --host 0.0.0.0     # Listen on all interfaces
asiai web --no-open          # Don't auto-open browser
```

Features: system overview, engine status, live benchmark with SSE progress, history charts, doctor checks, dark/light theme.

### `asiai fleet` + `asiai auth` + `aisctl fleet`

Multi-host management across several Macs. Two phases ship today:

- **Phase 1 — read-only observability** (in `asiai`). Each remote Mac
  runs `asiai web --host 0.0.0.0`; the orchestrator declares the nodes
  in `~/.config/asiai/fleet.json` and polls each one's
  `/api/v1/snapshot` in parallel.
- **Phase 2 — authenticated writes** (Bearer auth in `asiai`,
  `aisctl serve` + `aisctl fleet push` in
  [`asiai-inference-server`](https://github.com/druide67/asiai-inference-server)).
  Issue `purge`, `stop/start/restart`, `unload`, `install/uninstall`,
  `upgrade` against remote nodes with rate-limited token auth and a
  per-call audit log.

```bash
# --- Read-only (Phase 1) -------------------------------------------
asiai fleet add studio --url http://192.0.2.10:8899 --role workstation
asiai fleet list
asiai fleet status               # parallel poll, aggregated table
asiai fleet status --json | jq   # machine-readable form
asiai fleet ping studio          # single-node check

# --- Writes (Phase 2) ----------------------------------------------
# On the node: initialize the auth surface (prints secret ONCE).
asiai auth init
# On the node: start the loopback companion that runs the commands.
aisctl serve &
# On the orchestrator: register the node with its secret.
asiai fleet add studio --url http://192.0.2.10:8899 --auth-token asai_...
# Issue a write.
aisctl fleet push studio purge
aisctl fleet push studio restart --engine ollama
aisctl fleet push studio unload --engine ollama --model llama3.2
```

The `/fleet` page in `asiai web` shows a card per node with HTMX
auto-refresh every 10 seconds. Phase 3 will add mDNS Bonjour
auto-discovery and TLS. Full guide:
[docs/fleet-mode.md](docs/fleet-mode.md).

> ⚠️ Phase 1 is unauthenticated read-only; Phase 2 requires Bearer
> tokens. Both are designed for trusted LANs (or LANs glued together by
> a VPN like Tailscale/WireGuard). No TLS is enforced between nodes in
> v1.8 — that's a Phase 3 deliverable.

### `asiai leaderboard`

Browse community benchmarks. Filter by chip or model.

```bash
asiai leaderboard                      # All results
asiai leaderboard --chip "M4 Pro"      # Filter by chip
asiai leaderboard --model qwen2.5      # Filter by model
```

### `asiai compare`

Compare your local results against community medians.

```bash
asiai compare --chip "Apple M1 Max" --model qwen2.5:7b
```

### `asiai recommend`

Get engine recommendations based on your hardware and benchmarks.

```bash
asiai recommend                                # Best engine for your Mac
asiai recommend --use-case latency             # Optimize for TTFT
asiai recommend --model qwen2.5 --community    # Include community data
```

### `asiai setup`

Interactive setup wizard — detects hardware, engines, models, and suggests next steps.

```bash
asiai setup
```

### `asiai mcp`

Start the MCP server for AI agent integration. 11 tools, 3 resources.

```bash
asiai mcp                          # stdio (Claude Code, Cursor)
asiai mcp --transport sse          # SSE (network agents)
```

### `asiai tui`

Interactive terminal dashboard with auto-refresh. Requires `pip install asiai[tui]`.

```bash
asiai tui
```

## Benchmark Card — share your results

Generate a shareable benchmark card image with one flag:

```bash
asiai bench --card                    # SVG saved locally (zero dependencies)
asiai bench --card --share            # SVG + PNG via community API
asiai bench --quick --card --share    # Quick bench + card + share
```

![Benchmark card example](docs/assets/benchmark-card-example.png)

A **1200x630 dark-themed card** with your model, chip, specs banner (quantization, RAM, GPU cores, context size), engine comparison bar chart, winner highlight, and metric chips (tok/s, TTFT, power, engine version). Optimized for Reddit, X, Discord, and GitHub READMEs.

Every shared card includes asiai branding — the [Speedtest.net model](https://www.speedtest.net) for local LLM inference.

## Supported engines

| Engine | Port | Install | API |
|--------|------|---------|-----|
| [Ollama](https://ollama.com) | 11434 | `brew install ollama` | Native |
| [LM Studio](https://lmstudio.ai) | 1234 | `brew install --cask lm-studio` | OpenAI-compatible |
| [mlx-lm](https://github.com/ml-explore/mlx-examples) | 8080 | `brew install mlx-lm` | OpenAI-compatible |
| [llama.cpp](https://github.com/ggml-org/llama.cpp) | 8080 | `brew install llama.cpp` | OpenAI-compatible |
| [oMLX](https://github.com/jundot/omlx) | 8000 | `brew tap jundot/omlx && brew install omlx` | OpenAI-compatible |
| [vllm-mlx](https://github.com/vllm-project/vllm) | 8000 | `pip install vllm-mlx` | OpenAI-compatible |
| [vMLX](https://vmlx.net) | 8000 | `pip install vmlx` | OpenAI-compatible |
| [Exo](https://github.com/exo-explore/exo) | 52415 | `pip install exo` | OpenAI-compatible |

## What it measures

| Metric | Description |
|--------|-------------|
| **tok/s** | Generation speed (tokens/sec), excluding prompt processing (TTFT) |
| **TTFT** | Time to first token — prompt processing latency |
| **Power** | GPU, CPU, ANE, DRAM power in watts (IOReport, no sudo) |
| **tok/s/W** | Energy efficiency — tokens per second per watt |
| **Stability** | Run-to-run variance: stable (CV<5%), variable (<10%), unstable (>10%) |
| **VRAM** | Memory footprint — native API (Ollama, LM Studio) or `ri_phys_footprint` estimate (all other engines) |
| **Thermal** | CPU throttling state and speed limit percentage |

All metrics stored in SQLite (`~/.local/share/asiai/metrics.db`) with 90-day retention and automatic regression detection.

## Benchmark methodology

Following [MLPerf](https://mlcommons.org/benchmarks/inference-server/), [SPEC CPU 2017](https://www.spec.org/cpu2017/), and [NVIDIA GenAI-Perf](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/benchmarking/genai_perf.html) standards:

- **Warmup**: 1 non-timed generation per engine before measured runs
- **Runs**: 3 iterations per prompt (configurable), median as primary metric
- **Sampling**: `temperature=0` (greedy decoding) for deterministic results
- **Power**: Always-on via IOReport (no sudo). Per-engine, not session-wide average
- **Variance**: Pooled intra-prompt stddev (isolates run-to-run noise)
- **Metadata**: Engine version, model quantization, hardware chip, macOS version stored per result

See [docs/benchmark-best-practices.md](docs/benchmark-best-practices.md) for the full conformance audit.

## Benchmark prompts

Four standardized prompts test different generation patterns:

| Name | Tokens | Tests |
|------|--------|-------|
| `code` | 512 | Structured code generation (BST in Python) |
| `tool_call` | 256 | JSON function calling / instruction following |
| `reasoning` | 384 | Multi-step math problem |
| `long_gen` | 1024 | Sustained throughput (bash script) |

Use `--context-size 4k|16k|32k|64k` to test with large context fill prompts instead.

## API & Prometheus

When running `asiai web`, three REST API endpoints are available for programmatic access. Interactive API documentation (Swagger UI) is available at `http://localhost:8899/docs`.

| Endpoint | Description |
|----------|-------------|
| `GET /api/status` | Lightweight health check (< 500ms) — engine reachability, memory pressure, thermal |
| `GET /api/snapshot` | Full system + engine snapshot with loaded models, VRAM, versions |
| `GET /api/benchmarks` | Benchmark results with tok/s, TTFT, power, context_size, engine_version |
| `GET /api/engine-history` | Engine status history (TCP, KV cache, tokens predicted) |
| `GET /api/benchmark-process` | Process CPU/RSS metrics from benchmark runs (7d retention) |
| `GET /api/metrics` | Prometheus exposition format — system, engine, model, benchmark gauges |

### Prometheus integration

```yaml
# prometheus.yml
scrape_configs:
  - job_name: 'asiai'
    static_configs:
      - targets: ['localhost:8899']
    metrics_path: '/api/metrics'
    scrape_interval: 30s
```

### CLI JSON output

```bash
asiai monitor --json | jq '.mem_pressure'
asiai models --json | jq '.engines[].models[].name'
```

## Requirements

- macOS on Apple Silicon (M1 / M2 / M3 / M4 families)
- Python 3.11+
- At least one inference engine running locally

## Zero dependencies

The core uses only the Python standard library — `urllib`, `sqlite3`, `subprocess`, `argparse`. No `requests`, no `psutil`, no `rich`. Just stdlib.

Optional extras:
- `asiai[web]` — FastAPI web dashboard with charts
- `asiai[tui]` — Textual terminal dashboard
- `asiai[all]` — Web + TUI
- `asiai[dev]` — pytest, ruff

## Roadmap

| Version | Scope | Status |
|---------|-------|--------|
| **v0.1** | detect + bench + monitor + models (CLI, stdlib) | **Done** |
| **v0.2** | mlx-lm + doctor + daemon + TUI (Textual) | **Done** |
| **v0.3** | 5 engines, power metrics, multi-run variance, regression detection | **Done** |
| **v0.4** | CI, MkDocs, export JSON, thermal drift, web dashboard | **Done** |
| **v0.5** | REST API, Prometheus /metrics, CLI --json, engine uptime tracking | **Done** |
| **v0.6** | Multi-service LaunchAgent (`daemon start web`), daemon status/logs/stop --all | **Done** |
| **v0.7** | Alert webhooks, LM Studio VRAM, Ollama config in doctor | **Done** |
| **v1.0** | Community Benchmark DB, smart recommendations, Exo engine, leaderboard | **Done** |
| **v1.0.1** | MCP server (11 tools), benchmark card, `--quick` mode, setup wizard, agent integration | **Done** |
| **v1.2** | Web dashboard redesign, shareable cards, Share on X/Reddit, community API | **Done** |
| **v1.3** | Dark theme, self-hosted fonts, universal VRAM (phys_footprint), power in Monitor/History | **Done** |
| v1.7 | Fleet mode Phase 1 (multi-Mac read-only observability), `asiai fleet` CLI, `/fleet` web page | **Shipped** |
| v1.8 | Fleet Phase 2 (cross-host writes with Bearer auth, rate limit, audit log), `asiai auth` CLI, `aisctl serve` + `aisctl fleet push` companions | **Shipped** |
| v1.9+ | Fleet Phase 3 (mDNS Bonjour auto-discovery, TLS/mTLS, TUI fleet panel, MCP write tools), notifications macOS | Planned |

## License

Apache 2.0
