# pcq Agent Guide

pcq is the contract for agent-run ML experiments. The contract specification
(cq.yaml format, JSON contracts, MCP tool surface, strictness, conformance,
schema versioning) lives under
https://github.com/playidea-lab/pcq/tree/main/spec — that directory is the
single source of truth. This guide describes the contract from the agent's
perspective.

The Apache-2.0 Python package distributed via PyPI (`uv add pcq`) is the
reference implementation. The CQ Go service worker is a second implementation
targeting the same contract; future Go/JS clients are welcome. pcq is not a
training framework, model zoo, framework adapter matrix, or experiment
tracking SaaS — it is the contract layer that makes arbitrary ML code
inspectable, reproducible, verifiable, comparable, and repeatable through
standard files and JSON/MCP surfaces.

## Identity

- Package: pcq
- Import: `import pcq`
- CLI: `pcq`
- License: Apache-2.0
- Repository: https://github.com/playidea-lab/pcq
- PyPI: https://pypi.org/project/pcq/
- Website: https://playidea-lab.github.io/pcq/

Core sentence:

```text
pcq does not operate the model.
pcq operates the experiment boundary.
```

Runtime contract names:

- `cq.yaml`
- `CQ_CONFIG_JSON`
- `cq://`

These names do not mean pcq is usable only with the managed CQ service.

## What pcq Standardizes

An experiment project has a `cq.yaml` file:

```yaml
name: mnist-mlp
cmd: uv run python train.py
configs:
  seed: 42
  epochs: 3
  output_dir: output
  monitor: eval_acc
  mode: max
metrics:
  - epoch
  - eval_acc
artifacts:
  - output/
```

The training code can use any ML stack:

```python
import pcq

cfg = pcq.config()
out = pcq.output_dir()

# Run any training code here.
score = 0.82

pcq.log(epoch=0, eval_acc=score)
pcq.save_all(
    history=[{"epoch": 0, "eval_acc": score}],
    status="completed",
    artifacts={"model": "model.pt"},
)
```

## Read Path For Agents

Use these commands before editing or running:

```bash
pcq resolve --json
pcq inspect . --json
pcq validate . --strictness 2 --json
```

The agent should identify:

- project root
- selected `cq.yaml`
- command to run
- declared metrics
- output directory
- existing artifacts
- previous run records
- validation warnings or blocking failures

## Run Path For Agents

For a final result object only:

```bash
pcq run --path . --json
```

For live events:

```bash
pcq run --path . --jsonl
```

For a final JSON object plus event evidence in a file:

```bash
pcq run --path . --events output/events.jsonl --json
```

JSONL events are newline-delimited JSON objects. Each event includes at least:

- `schema_version`
- `seq`
- `time`
- `event`

Important event types:

- `run.started`
- `stdout`
- `stderr`
- `metric`
- `run.completed`
- `run.failed`
- `run.error`

Metric events are derived from `pcq.log(...)` stdout lines in `@key=value`
format.

## Post-Run Path For Agents

After the process exits:

```bash
pcq validate-run output --strictness 3 --json
pcq describe-run output --json
```

For comparing two iterations:

```bash
pcq compare-runs old_output new_output --json
pcq lineage new_output --json
```

`describe-run` and `compare-runs` expose decision facts. They intentionally do
not decide whether to continue, rollback, or accept a run.

## Standard Artifacts

A valid run should produce:

- `config.json`
- `metrics.json`
- `manifest.json`
- `run_summary.json`
- `run_record.json`
- `validation_report.json`

`run_record.json` is the canonical completion record. It should contain source,
environment, input, metric, artifact, validation, lineage, and summary evidence.

## Agent Behavior

Do:

- Prefer JSON/JSONL surfaces over scraping terminal prose.
- Keep project-specific model, dataset, loss, optimizer, scheduler, and
  framework code in the user's project.
- Declare metrics in `cq.yaml` before emitting them with `pcq.log(...)`.
- Use `pcq.output_dir()` rather than hard-coded `output/` paths.
- Treat failed runs as evidence when partial artifacts can be preserved.

Do not:

- Treat process exit code alone as experiment success.
- Assume pcq owns the training loop.
- Assume CQ service is required.
- Add framework adapters when direct contract code is enough.
- Edit pcq internals for one project-specific experiment.

## MCP Integration (v4.1.0)

pcq ships an optional Model Context Protocol server so agent runtimes
(Claude Code, Codex, custom LLM clients) can call pcq with structured
JSON instead of shelling out and parsing stdout.

Install:

```bash
uv add 'pcq[mcp]'
```

Wire the project:

```bash
pcq init-experiment --output ./my-exp --agent claude
pcq agent install --target claude --path ./my-exp --mcp
```

The `--mcp` flag merges the following into `.mcp.json` (existing entries
preserved):

```json
{
  "mcpServers": {
    "pcq": {
      "command": "pcq",
      "args": ["mcp", "serve"]
    }
  }
}
```

Serve:

```bash
pcq mcp serve                                            # stdio (default)
pcq mcp serve --transport sse --host 127.0.0.1 --port 8765
```

14 MCP tools (read-only tools never mkdir, never mutate cq.yaml, never
spawn subprocesses):

| Tool | Read-only | Maps to |
|------|-----------|---------|
| `resolve_project` | yes | `pcq resolve` |
| `inspect_project` | yes | `pcq inspect` |
| `validate_project` | yes | `pcq validate` |
| `validate_run` | yes | `pcq validate-run` |
| `describe_run` | yes | `pcq describe-run` |
| `compare_runs` | yes | `pcq compare-runs` |
| `lineage_chain` | yes | `pcq lineage` |
| `apply_plan` | no | `pcq apply-plan` |
| `apply_planset` | no | `pcq apply-planset` |
| `init_experiment` | no | `pcq init-experiment` |
| `finalize_run` | no | `pcq finalize` |
| `agent_install` | no | `pcq agent install` |
| `agent_status` | yes | `pcq agent status` |
| `run_experiment` | no | `pcq run` |

Every tool's input/output is anchored in the
`pcq.agent.json_contracts.JSON_CONTRACTS` registry frozen since v2.13.

Long multi-hour GPU training should not block the in-process
`run_experiment` tool. For that workload, prefer the CQ service queue
which consumes the same contract.

Embed the registry directly without the MCP server wrapper:

```python
from pcq.mcp.tools import build_tools
import asyncio

tools = build_tools()
resolve = next(t for t in tools if t.name == "resolve_project")
result = asyncio.run(resolve.handler({"path": "."}))
```

## Examples

### sklearn — RandomForest on Iris

```yaml
# cq.yaml
name: sklearn-baseline
cmd: uv run python train.py
configs:
  output_dir: output
  seed: 42
  n_estimators: 100
  monitor: eval_acc
  mode: max
metrics:
  - epoch
  - eval_acc
artifacts:
  - output/
```

```python
# train.py
import pcq, joblib
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

cfg = pcq.config()
out = pcq.output_dir()
pcq.seed_everything(cfg.get("seed", 42))

X, y = load_iris(return_X_y=True)
X_tr, X_te, y_tr, y_te = train_test_split(
    X, y, test_size=0.2, random_state=cfg["seed"]
)
model = RandomForestClassifier(n_estimators=cfg.get("n_estimators", 100))
model.fit(X_tr, y_tr)
acc = float(model.score(X_te, y_te))

pcq.log(epoch=0, eval_acc=acc)
joblib.dump(model, out / "model.pkl")
pcq.save_all(history=[{"epoch": 0, "eval_acc": acc}],
             artifacts={"model": "model.pkl"})
```

### PyTorch — agnostic training loop

```python
import pcq, torch
from torch import nn

cfg = pcq.config()
out = pcq.output_dir()
pcq.seed_everything(cfg.get("seed", 42))

model = nn.Linear(cfg["in_dim"], cfg["out_dim"])
opt = torch.optim.Adam(model.parameters(), lr=cfg["lr"])

history = []
for epoch in range(cfg["epochs"]):
    train_loss = train_one_epoch(model, opt)   # user code
    val_acc = evaluate(model)                   # user code
    pcq.log(epoch=epoch, train_loss=train_loss, val_acc=val_acc)
    history.append({"epoch": epoch, "train_loss": train_loss, "val_acc": val_acc})

torch.save(model.state_dict(), out / "model.pt")
pcq.save_all(history=history, artifacts={"model": "model.pt"})
```

## Tool Response Samples

Real captured responses for four canonical MCP tools are inlined at
https://playidea-lab.github.io/pcq/#tools-catalog. Each sample comes from
a real run of `examples/contract_sklearn`; only volatile fields
(timestamps, git_sha, sha256 hashes, absolute paths) are elided as
`"..."`. Anchors:

- `#tools-catalog-describe_run`  — compact RunRecord summary
- `#tools-catalog-compare_runs`  — diff between two RunRecords
- `#tools-catalog-validate_run`  — strictness=3 pass/warn/fail report
- `#tools-catalog-lineage_chain` — parent chain walk

`agent-manifest.json` mirrors these under the `tool_response_samples`
array (`name` + `command` + `mcp_tool` + `sample_anchor` per entry), so
agents that prefer JSON over HTML can discover the same evidence
machine-readably.

## Case Studies

Four production dogfoods (pcq's own validation cycle on real ML
workloads) are documented in `docs/case-studies/` and surfaced on the
site at `#case-studies`:

- **MNIST Dogfood (2026-05-08)** — pcq v2.11, MNIST digits, 9 fresh
  agent generations, eval_acc 0.9583 → 1.0. First end-to-end dogfood;
  drove the v2.12 round of fixes.
- **Tabular Dogfood (2026-05-09)** — pcq 3.0.1, breast-cancer dataset,
  TabPFN/PyCaret/FLAML/XGBoost/sklearn diversity. First post-PyPI
  install path validation.
- **MCP Dogfood (2026-05-10)** — pcq[mcp] 4.1.0, Claude Code MCP. First
  v4.1.0 MCP loop end-to-end via `mcp__pcq__*` tools (no subprocess
  CLI). 3 sequential generations.
- **CQ Worker Dogfood (2026-05-10)** — pcq[mcp] 4.2.0, CQ Go service
  worker on RTX 5080. First production CQ Go worker dispatch end-to-end;
  verified `cq.yaml` + `CQ_CONFIG_JSON` + 6-artifact protocol.

## Spec

The contract surface (cq.yaml format, JSON contracts, MCP tools,
strictness, schema versioning, conformance) lives at the repository
root under `spec/`, separately from the Python implementation in
`src/pcq/`. Other languages and runtimes can target the contract
without depending on Python.

- Index: https://github.com/playidea-lab/pcq/blob/main/spec/INDEX.md
- JSON Schemas (auto-exported from `pcq.agent.json_contracts.JSON_CONTRACTS`
  via `scripts/export_schemas.py`):
  https://github.com/playidea-lab/pcq/tree/main/spec/schemas
- Versioning policy: https://github.com/playidea-lab/pcq/blob/main/spec/VERSIONING.md
- Conformance suite: https://github.com/playidea-lab/pcq/blob/main/spec/CONFORMANCE.md
  (golden input/output pairs at `tests/conformance/<contract>/<case>/`)

CI guards drift between the registry and the on-disk schemas via
`uv run python scripts/export_schemas.py --check`.

## Roadmap

The thesis stays the same: pcq does not compete with the means of
training. The work ahead strengthens the framework-neutral evidence and
control layer — broader real-world contract coverage, deeper
validation/lineage facts, more machine-readable surfaces for agent
runtimes, tighter integration with the CQ managed consumer. Built-in
models, losses, datasets, and per-framework adapter matrices remain
deliberately out of scope.

- v4 Direction: https://github.com/playidea-lab/pcq/blob/main/docs/V4_DIRECTION.md
- Completion Roadmap: https://github.com/playidea-lab/pcq/blob/main/docs/PCQ_COMPLETION_ROADMAP.md
- Releases: https://github.com/playidea-lab/pcq/releases

## Related Docs

- v4 direction: https://github.com/playidea-lab/pcq/blob/main/docs/V4_DIRECTION.md
- Introduction: https://github.com/playidea-lab/pcq/blob/main/docs/INTRODUCTION.md
- JSON contracts: https://github.com/playidea-lab/pcq/blob/main/docs/JSON_CONTRACTS.md
- Agent guide: https://github.com/playidea-lab/pcq/blob/main/docs/AGENT_OPERATING_GUIDE.md
- Strictness: https://github.com/playidea-lab/pcq/blob/main/docs/STRICTNESS.md
- Runtime contract: https://github.com/playidea-lab/pcq/blob/main/docs/CQ_YAML_RUNTIME_CONTRACT.md
- MCP integration: https://github.com/playidea-lab/pcq/blob/main/docs/MCP_INTEGRATION.md
