Metadata-Version: 2.4
Name: green-mcp
Version: 0.1.0
Summary: Pluggable MCP server measuring two efficiency axes — CPU energy (uProf/RAPL/powermetrics) and LLM tokens — and refactoring code to be cheaper while preserving behavior
License-Expression: MIT
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: mcp>=1.0
Provides-Extra: agent
Requires-Dist: claude-agent-sdk>=0.1.0; extra == "agent"
Provides-Extra: analysis
Requires-Dist: ruff>=0.4; extra == "analysis"
Requires-Dist: radon>=6.0; extra == "analysis"
Requires-Dist: psutil>=5.9; extra == "analysis"
Provides-Extra: dev
Requires-Dist: claude-agent-sdk>=0.1.0; extra == "dev"
Requires-Dist: pytest>=8; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23; extra == "dev"
Requires-Dist: build>=1.0; extra == "dev"
Dynamic: license-file

# Green Agent

A pluggable **MCP server** that measures two efficiency axes of a program and refactors it to be
cheaper while preserving behavior:

- **CPU energy** — joules the code actually consumes, from real hardware telemetry
  (AMD uProf / Linux RAPL / macOS powermetrics), chosen automatically for the host.
- **LLM tokens** — the token usage of a program that calls LLMs (input/output/cache/reasoning),
  measured provider-neutrally and non-blockingly.

Defining principle: **measure, never estimate.** Every claim is a measurement with the command, the
number, and its run-to-run uncertainty — or it's labeled an estimate. The comparison verdict is a
real statistical test (Welch's t with the idle-baseline uncertainty folded in), not a heuristic.

## Quick start

```sh
pipx install green-mcp        # provides the `green-mcp` command (stdlib + mcp only)
```

Mount it in your IDE (configs in [`deploy/`](deploy/)):

| IDE | File |
|---|---|
| Claude Code | `.mcp.json` |
| Cursor | `.cursor/mcp.json` |
| OpenAI Codex | `~/.codex/config.toml` (`codex mcp add green -- green-mcp`) |
| Google Antigravity | `~/.gemini/config/mcp_config.json` |

Then ask your agent to measure or compare energy/tokens of a command. See
[`deploy/README.md`](deploy/README.md) for the full mount + harness guide.

## Tools

`measure_energy` · `compare_energy` · `measure_tokens` · `compare_tokens` ·
`verify_equivalence` · `energy_backend_info`

## Requirements (what each part needs to actually work)

**No server to host.** green-mcp is not a web service — your IDE launches it as a local stdio
subprocess. There's no cloud, no account, and no LLM key needed for the measurement server itself.

| To run the server | Python 3.10+, `pip install green-mcp`. That's it — tools mount immediately. |
|---|---|

**Energy axis** — needs a power-sensor backend on the host (the largest prerequisite):

- **Windows + AMD** → install **AMD uProf** separately (driver-based; admin to install). Not bundled.
- **Linux** → reads `/sys/class/powercap` (RAPL); no extra install, but `energy_uj` is root-only on
  some distros.
- **macOS** → uses the built-in `powermetrics`, which requires **root / passwordless sudo**.
- **No reachable sensor** (a VM, a container, a locked-down machine) → energy tools report
  `energy_available: false` and refuse to estimate. **Energy generally does NOT work in Docker/CI** —
  containers and VMs have no power-sensor passthrough. Use the token axis there.
- The measured command runs **locally** (arbitrary commands → use in a trusted environment only).

**Token axis** — no special hardware, works anywhere, **but**:

- The target program must read its LLM endpoint from an **env var** (`ANTHROPIC_BASE_URL`,
  `OPENAI_BASE_URL`, …) so we can route it through the counting proxy. A hardcoded endpoint won't be
  measured (reports 0 calls).
- Measuring runs the target's **real LLM calls** — the proxy forwards to the real provider, so the
  target's API key is billed as usual, and network access to the provider is required.

**Bundled agent** (optional) — `pip install green-mcp[agent]` adds the Claude Agent SDK and needs
**Anthropic credentials**. The MCP server alone needs none.

## Honest scope

- Energy is **CPU package energy** (+DRAM on RAPL) — not carbon, not whole-system.
- Numbers from **different backends are not comparable**.
- Only the **AMD/uProf** backend is validated for repeatability on real hardware; Linux/macOS are
  written and unit-tested but unverified on metal, and no backend is yet cross-validated against a
  wall power meter. The token measurer is validated against a local fake upstream, not a live
  provider. These gaps are tracked, not hidden.

## Development

```sh
python -m venv .venv && .venv/Scripts/pip install -e ".[dev]"
.venv/Scripts/python -m pytest -q          # unit tests
.venv/Scripts/python -m pytest -m integration   # real-hardware (needs AMD uProf)
```

Architecture and decisions live in [`North Star.md`](North%20Star.md), [`Green.md`](Green.md),
and `docs/`. Licensed under [MIT](LICENSE).
