Metadata-Version: 2.4
Name: pinion-mcp
Version: 0.3.1
Summary: AI-Powered Characterization Test Generator MCP — lock legacy Python behavior into pytest so you can refactor safely.
Project-URL: Homepage, https://github.com/namojo/pinion
Project-URL: Repository, https://github.com/namojo/pinion
Project-URL: Documentation, https://github.com/namojo/pinion/blob/main/docs/USER_GUIDE.md
Project-URL: User Guide (English), https://github.com/namojo/pinion/blob/main/docs/USER_GUIDE.md
Project-URL: User Guide (한국어), https://github.com/namojo/pinion/blob/main/docs/USER_GUIDE_kr.md
Project-URL: Design Spec, https://github.com/namojo/pinion/blob/main/docs/SPEC.md
Project-URL: Roadmap, https://github.com/namojo/pinion/blob/main/docs/V2_ROADMAP.md
Project-URL: Issues, https://github.com/namojo/pinion/issues
Project-URL: Changelog, https://github.com/namojo/pinion/releases
Author: namojo
Maintainer: namojo
License: Apache-2.0
License-File: LICENSE
Keywords: ai,characterization-tests,legacy,llm,mcp,model-context-protocol,modernization,pytest,testing
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: MacOS
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Code Generators
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Topic :: Software Development :: Testing
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: coverage>=7.4
Requires-Dist: litellm>=1.50
Requires-Dist: mcp>=1.0
Requires-Dist: pydantic>=2.5
Requires-Dist: typer>=0.12
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest-httpserver>=1.0; extra == 'dev'
Requires-Dist: pytest-mock>=3.12; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.5; extra == 'dev'
Description-Content-Type: text/markdown

# Pinion

> **Lock legacy code behavior into pytest — so you can finally refactor it.**

[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
[![Python 3.11+](https://img.shields.io/badge/python-3.11+-informational.svg)](https://www.python.org/)
[![MCP](https://img.shields.io/badge/protocol-MCP-7c3aed.svg)](https://modelcontextprotocol.io/)
[![Tests](https://img.shields.io/badge/tests-112%20passing-success.svg)](#)
[![PyPI](https://img.shields.io/badge/pypi-coming%20soon-lightgrey.svg)](#)

Pinion is an **AI-powered characterization-test generator** that reads a Python function or class method, synthesizes representative inputs, captures the function's actual behavior in a sandbox, and emits a self-contained `pytest` file that locks that behavior in. It runs as a CLI and as a stdio Model Context Protocol (MCP) server, so it works inside Claude Code, Claude Desktop, Cursor, Cline, Codex CLI, Gemini CLI, Zed, `revfactory/harness`, and any other MCP-aware client.

> 🇰🇷 [한국어 README는 여기로 →](README_kr.md)
> 🇰🇷 [한국어 사용자 매뉴얼은 여기로 →](docs/USER_GUIDE_kr.md)

---

## Why Pinion

Legacy modernization has a chicken-and-egg problem. To refactor safely you need tests. To write tests you need to understand the code. To understand the code you need to refactor it. Most teams stall here for years.

Existing tools have not closed this gap:

- **ApprovalTests / pinning-test libraries** require a human to choose the inputs.
- **Hypothesis / property-based testing** requires a human to write strategies.
- **EvoSuite** is Java-only and search-based.
- **Vendor AI assistants** can suggest tests in chat, but they don't run, validate coverage, or capture real behavior.

Pinion treats *input selection* as a reasoning task and gives it to an LLM — then validates the result with deterministic tools (AST analysis, sandboxed execution, `coverage.py`) before emitting a regular `pytest` file you can read, edit, and commit.

The AI component is essential, not decorative: removing it leaves you with a sandbox that has nothing to run.

---

## Quickstart

### Install

```bash
pip install pinion-mcp
```

### Pick a provider

Pinion supports five LLM backends — Anthropic Claude, OpenAI ChatGPT, Google Gemini, local Ollama, or an internal enterprise gateway. Pick whichever you already have or grab the free Gemini tier:

```bash
# (a) Anthropic Claude — default
export ANTHROPIC_API_KEY="sk-ant-..."

# (b) OpenAI / ChatGPT
export PINION_LLM_PROVIDER=openai
export OPENAI_API_KEY="sk-..."

# (c) Google Gemini (free tier — https://aistudio.google.com/apikey)
export PINION_LLM_PROVIDER=gemini
export GEMINI_API_KEY="AIza..."
```

### Generate tests for a function (v1)

```bash
pinion characterize ./legacy/order_service.py \
  --function calculate_total \
  --out tests/test_order_service_pinned.py
```

Drop `--function` to characterise every pure top-level function in the module.

### Generate tests for a class method (v2.0)

```bash
pinion characterize ./legacy/cart.py \
  --class Cart --method total \
  --out tests/test_cart_total_pinned.py
```

Drop `--method` to characterise every public method on the class. Pinion automatically figures out how to construct the instance and which helper methods (`add_item`, `apply_discount`, …) to call first to put the instance into a meaningful state. Plain classes, `@dataclass`, and `pydantic.BaseModel` all work.

### Use Pinion as an MCP server

```bash
claude mcp add pinion -- pinion-mcp serve
```

Then, in any MCP-aware client:

> *"Use pinion to characterise `legacy/order_service.py::calculate_total` and write the tests to `tests/test_order_service_pinned.py`."*

Pinion exposes four MCP tools:

- `characterize_function(file_path, function_name, …)` — v1
- `characterize_method(file_path, class_name, method_name, …)` — **v2.0**
- `characterize_module(file_path, …)`
- `health_check(probe=False)`

The next section lists every MCP client we've registered Pinion with.

---

## MCP Clients

MCP is an open protocol. Pinion is **not** Claude-only — anything that speaks stdio MCP can mount it.

| Client | How to register Pinion |
|---|---|
| **Claude Code** (CLI) | `claude mcp add pinion -- pinion-mcp serve` |
| **Claude Desktop** | `~/Library/Application Support/Claude/claude_desktop_config.json` → `"mcpServers": {"pinion": {"command": "pinion-mcp", "args": ["serve"]}}` |
| **Cursor** | `.cursor/mcp.json` (same `mcpServers` shape) |
| **Cline** (VS Code) | Extension settings → MCP Servers → `pinion-mcp serve` |
| **Continue.dev** (VS Code / JetBrains) | `~/.continue/config.json` → `mcpServers` |
| **Codex CLI** (OpenAI) | `~/.codex/config.toml` → `[mcp_servers.pinion]` |
| **Gemini CLI** (Google) | `~/.gemini/settings.json` → `mcpServers` |
| **Zed Editor** | `settings.json` → `context_servers` |
| **`revfactory/harness`** | `harness.yaml` → `mcp_servers:` |
| **Custom client** | Anthropic's `mcp` SDK (Python or TypeScript) — call `pinion-mcp serve` over stdio |

Same payload shape, different config file locations.

---

## How it works

```
  +-----------+     +------------+     +-----------+     +------------+     +----------+
  | analyzer  | --> | synthesizer| --> |  sandbox  | --> | coverage   | --> | emitter  |
  | (AST)     |     | (LLM)      |     | (subproc  |     | (line+arc) |     | (pytest) |
  | profile   |     | inputs     |     |  + rlimit)|     | gate       |     | code     |
  +-----------+     +------------+     +-----------+     +------------+     +----------+
        deterministic         LLM               deterministic              deterministic

         If coverage < threshold, the synthesizer is invoked again with
         the missing branches as additional context. Up to 3 rounds.
```

1. **Profile.** Static AST analysis pulls the signature, type hints, docstring, branch structure, and external calls. For class methods (v2.0) it also produces a `ClassProfile` with the constructor signature and instance attributes.
2. **Synthesize.** The profile (not the source) goes to the LLM together with the missing-branch hints. The LLM returns a JSON list of input cases — for methods, each case includes a `setup` block describing how to construct the instance and which helper methods to invoke first. The output is validated against a Pydantic schema before it is trusted.
3. **Capture.** Each input is executed in a fresh subprocess with CPU, memory, file-descriptor, environment, and network limits in place. Return values, exceptions, and stdout/stderr tails are captured. v2.0.1 attributes exceptions to the right phase (construction / post-init / target-method).
4. **Validate.** `coverage.py` measures line and branch coverage. If we are below threshold (default 0.8), the synthesizer is asked for more cases targeting the missing branches.
5. **Emit.** A clean, reviewable `pytest` file is produced — for methods, with `@pytest.fixture` per unique setup hash so cases that share a setup also share a fixture.

---

## Capabilities and limitations

Pinion ships honest. It refuses, never silently degrades.

**What works today (v1 + v2.0)**

- ✅ Top-level pure functions
- ✅ Class methods on plain classes, `@dataclass`, and `pydantic.BaseModel`
- ✅ Five LLM providers via env-var-only switching
- ✅ Provider-and-model-aware retry on truncated JSON
- ✅ `@pytest.fixture` sharing for class methods
- ✅ macOS and Linux

**What v1/v2.0 deliberately refuse**

- **Pure functions only by default.** Functions touching the filesystem, network, databases, or `subprocess` are refused unless `--allow-impure` is set, in which case there is no correctness guarantee.
- **No abstract base classes, metaclass-heavy classes, or `__init_subclass__` users.** v2.0 refuses these because the construction path is not safe to drive automatically.
- **JSON-friendly arguments only.** Constructors and method calls take JSON-serialisable values. User-defined-class arguments are properly supported once v2.2 (mock adapters) ships.
- **Process-level sandbox, not a security boundary.** Run Pinion only on code you have read, on disposable workstations or CI runners. The sandbox protects you from runaway loops and accidental I/O, not from a determined adversary.
- **No async functions yet.** v2.1 adds those.
- **Windows is best-effort.** No `resource.setrlimit`.

These boundaries are explicit in [`docs/SPEC.md`](docs/SPEC.md) §10 and in the code paths themselves.

---

## LLM Providers

| Provider | `PINION_LLM_PROVIDER` | Default model | Notes |
|---|---|---|---|
| Anthropic Claude (default) | `anthropic` | `claude-sonnet-4-5` | `ANTHROPIC_API_KEY` required |
| OpenAI / ChatGPT | `openai` | `gpt-4o-mini` | `OPENAI_API_KEY` required |
| Google Gemini | `gemini` | `gemini-2.5-flash` | `GEMINI_API_KEY` (or `GOOGLE_API_KEY`). Free tier at [aistudio.google.com/apikey](https://aistudio.google.com/apikey) |
| Local Ollama | `ollama` | `qwen2.5-coder` | `PINION_OLLAMA_URL` (default `http://localhost:11434`) |
| Internal Enterprise Gateway | `enterprise-gateway` | _(set explicitly)_ | OpenAI-compatible endpoint, see below |

Override the default model any time with `PINION_LLM_MODEL=<model-name>`.

### Internal Enterprise Gateway

The `enterprise-gateway` slot is wired but **inactive by default**. To use a private internal LLM gateway (assuming OpenAI-compatible API), set:

```bash
export PINION_LLM_PROVIDER=enterprise-gateway
export PINION_LLM_MODEL=<gateway-model-name>
export PINION_GATEWAY_URL=https://internal-llm.example.com/v1
export PINION_GATEWAY_API_KEY=<token>
```

No code change required. `pinion-mcp` exposes a `health_check(probe=true)` tool to verify connectivity. If your internal gateway is not OpenAI-compatible, add a thin adapter — the abstraction lives in `pinion/providers.py`.

---

## Configuration

All configuration is via environment variables. See [`docs/SPEC.md` §8](docs/SPEC.md) for the complete list. Key ones:

```bash
PINION_LLM_PROVIDER=anthropic            # anthropic | openai | gemini | ollama | enterprise-gateway
PINION_LLM_MODEL=claude-sonnet-4-5       # provider-specific
PINION_DEFAULT_THRESHOLD=0.8             # coverage gate
PINION_MAX_ROUNDS=3                      # max LLM re-synthesis rounds
PINION_SANDBOX_TIMEOUT=5.0               # seconds per case
PINION_SANDBOX_MEMORY_MB=256             # RLIMIT_AS per case
```

---

## Dogfooding

We point Pinion at Pinion. The full report — including two real limitations the run surfaced and the fix we shipped because of them — lives at [`examples/dogfooding/README.md`](examples/dogfooding/README.md).

| Run | Mode | Target | Outcome |
|---|---|---|---|
| 1 | v1 (function) | `pinion.providers.resolve_litellm_model` | Tests passed, but exposed the **JSON-only input contract** limitation when the function takes a typed-class argument (motivates v2.2) |
| 2 | v2 (method)   | `examples.demo_legacy_class.Cart.total` | 100% coverage in 1 LLM round; initially 6/8 emitted tests passed — exposed a v2.0 **setup-vs-method exception attribution** bug we then fixed in v2.0.1 (now 8/8) |

The dogfooding run also drove one user-visible default change: `DEFAULT_MAX_TOKENS` was raised from 4096 to 8192 after Gemini truncated long routing-function responses.

The point of dogfooding is not "the tool worked perfectly." It is "the tool worked, and here is exactly where it does not." Both runs reproduce on the Gemini free tier at $0 total.

---

## Roadmap

**Shipped**

- ✅ **v1** — top-level pure functions, five LLM providers, MCP server, CLI, demo, full test suite (80 tests)
- ✅ **v2.0** — class methods on plain classes / `@dataclass` / pydantic models; per-setup `@pytest.fixture` sharing; new `characterize_method` MCP tool (110 tests total)
- ✅ **v2.0.1** — setup-phase vs method-phase exception attribution fix (112 tests total)

**Next (designed in [`docs/V2_ROADMAP.md`](docs/V2_ROADMAP.md))**

- **v2.1** — async functions (`async def`) with isolated event loops
- **v2.2** — user-supplied mock adapters (replay / stub / route) for I/O-heavy functions
- **v2.3** — `pinion diff orig.py --against new.py` golden-master diff mode for refactor reviews
- **v2.4** — source-hash cache so unchanged code skips LLM re-synthesis

**Further out (v3)**

- TypeScript via tree-sitter (vitest emitter)
- Java + JUnit emitter
- Property-based test synthesis (Hypothesis strategies)
- VS Code extension

The full roadmap, with design notes and DoDs, is in [`docs/V2_ROADMAP.md`](docs/V2_ROADMAP.md).

---

## Contributing

Pinion is Apache 2.0 licensed and welcomes contributions. The design contract is frozen in [`docs/SPEC.md`](docs/SPEC.md); please read it before opening a PR that changes interfaces. For bug fixes and additional fixtures, just open an issue or PR.

---

## License

Apache License 2.0. See [LICENSE](LICENSE).
