pcq experiment contracts

Apache-2.0 · Python · MCP-ready

Run anything. Standardize the evidence.

pcq turns ML experiments into agent-operable, reproducible units. Keep your existing training stack; pcq standardizes the run boundary — contract, live evidence, validation, comparison, lineage, final run record — and exposes 14 Model Context Protocol tools for Claude Code, Codex, and custom agents.

uv add 'pcq[mcp]'

Why pcq

Not another trainer. Control around the run.

Framework-neutral

Use PyTorch, Hugging Face Trainer, Lightning, sklearn, TabPFN, PyCaret, XGBoost, shell scripts, or custom Python. pcq standardizes the surrounding evidence instead of replacing the training loop.

Agent-readable

JSON/JSONL CLI surfaces, strictness gates, manifests, lineage, and run summaries are designed for coding agents, CI jobs, and services that need facts rather than prose.

Reproducible boundary

Config, metrics, source identity, environment, inputs, artifacts, validation, and best/last results converge into one run_record.json.

MCP-ready

pcq[mcp] exposes 14 tools to Claude Code, Codex, and any MCP-aware runtime. No subprocess, no stdout scraping — structured input/output anchored in frozen JSON contracts.

CI-friendly

--json for one final object; --jsonl for live event streams; --events to persist evidence alongside the run for post-hoc audits.

Service-ready

The CQ managed service consumes the same contract for queueing, dashboards, and agent loops. pcq stays useful without it.

Core contract

A small set of files that every run can explain.

pcq keeps the model code free-form and makes the run boundary explicit. The project declares intent in cq.yaml, the script emits metrics, and pcq finalizes the standard artifacts.

  • cq.yaml declares command, config, metrics, inputs, and artifacts.
  • pcq.log() emits structured metric history.
  • pcq.save_all() writes the standard artifact set.
  • pcq finalize turns a run directory into evidence.
config.json metrics.json manifest.json run_summary.json run_record.json validation_report.json

Examples

Same contract, any framework.

The training code stays yours. pcq only requires three calls: pcq.config(), pcq.log(...), and pcq.save_all(...).

1. Minimal cq.yaml

The contract that turns a directory into a runnable, validatable experiment.

# cq.yaml
name: sklearn-baseline
cmd: uv run python train.py
configs:
  output_dir: output
  seed: 42
  n_estimators: 100
  monitor: eval_acc
  mode: max
metrics:
  - epoch
  - eval_acc
artifacts:
  - output/

2. sklearn — RandomForest on Iris

No adapter, no Trainer subclass. Three pcq calls.

# train.py
import pcq
import joblib
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

cfg = pcq.config()
out = pcq.output_dir()
pcq.seed_everything(cfg.get("seed", 42))

X, y = load_iris(return_X_y=True)
X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.2,
                                          random_state=cfg["seed"])

model = RandomForestClassifier(n_estimators=cfg.get("n_estimators", 100))
model.fit(X_tr, y_tr)
acc = float(model.score(X_te, y_te))

pcq.log(epoch=0, eval_acc=acc)
joblib.dump(model, out / "model.pkl")
pcq.save_all(history=[{"epoch": 0, "eval_acc": acc}],
             artifacts={"model": "model.pkl"})

3. PyTorch — training loop

Keep your own model, optimizer, dataloader. pcq sits at the boundary.

# train.py
import pcq, torch
from torch import nn

cfg = pcq.config()
out = pcq.output_dir()
pcq.seed_everything(cfg.get("seed", 42))

model = nn.Linear(cfg["in_dim"], cfg["out_dim"])
opt = torch.optim.Adam(model.parameters(), lr=cfg["lr"])

history = []
for epoch in range(cfg["epochs"]):
    train_loss = train_one_epoch(model, opt)        # your code
    val_acc = evaluate(model)                       # your code
    pcq.log(epoch=epoch, train_loss=train_loss, val_acc=val_acc)
    history.append({"epoch": epoch,
                    "train_loss": train_loss,
                    "val_acc": val_acc})

torch.save(model.state_dict(), out / "model.pt")
pcq.save_all(history=history, artifacts={"model": "model.pt"})

4. NumPy — zero-dependency demo

Proof that pcq does not require a framework.

# train.py
import numpy as np, pcq

cfg = pcq.config()
out = pcq.output_dir()
rng = np.random.default_rng(cfg.get("seed", 42))

w = rng.normal(scale=0.1, size=4)
history = []
for epoch in range(cfg.get("epochs", 3)):
    loss = float(np.mean(w ** 2))
    w *= 0.9
    pcq.log(epoch=epoch, train_loss=loss)
    history.append({"epoch": epoch, "train_loss": loss})

np.savez(out / "model.npz", weights=w)
pcq.save_all(history=history, artifacts={"model": "model.npz"})

5. Run, validate, compare

The same five commands work for every example above.

# Final JSON envelope
pcq run --path . --json

# Live newline-delimited events (run.started, stdout, metric, run.completed)
pcq run --path . --jsonl

# Final JSON + persisted event log
pcq run --path . --events output/events.jsonl --json

# Post-run facts for an agent or CI
pcq validate-run output --strictness 3 --json
pcq describe-run output --json
pcq compare-runs prev_output output --json
pcq lineage output --json

Model Context Protocol

Talk to pcq with structured tools, not stdout scraping.

Phase 6 (v4.1.0) ships an MCP server that exposes 14 of pcq's CLI surfaces as native tools for Claude Code, Codex, and any MCP-aware runtime. Read-only tools never mkdir, never mutate cq.yaml, and never spawn subprocesses.

1. Install

uv add 'pcq[mcp]'
# or:
pip install 'pcq[mcp]'

2. Wire into a project

pcq init-experiment --output ./my-exp --agent claude
pcq agent install --target claude --path ./my-exp --mcp

The --mcp flag merges a pcq entry into .mcp.json; existing entries are preserved.

3. Serve

# stdio (Claude Code, Codex)
pcq mcp serve

# HTTP/SSE (web services, remote clients)
pcq mcp serve --transport sse --host 127.0.0.1 --port 8765

.mcp.json

{
  "mcpServers": {
    "pcq": {
      "command": "pcq",
      "args": ["mcp", "serve"]
    }
  }
}

14 MCP tools

ToolRead-onlyMaps to
resolve_projectyespcq resolve
inspect_projectyespcq inspect
validate_projectyespcq validate
validate_runyespcq validate-run
describe_runyespcq describe-run
compare_runsyespcq compare-runs
lineage_chainyespcq lineage
apply_plannopcq apply-plan
apply_plansetnopcq apply-planset
init_experimentnopcq init-experiment
finalize_runnopcq finalize
agent_installnopcq agent install
agent_statusyespcq agent status
run_experimentnopcq run

Every tool's input/output is anchored in pcq.agent.json_contracts.JSON_CONTRACTS, frozen since v2.13. Long multi-hour training should stay on the CQ service queue rather than blocking run_experiment. Full details: docs/MCP_INTEGRATION.md.

Agent-operable by design

Enough structure for an agent to inspect, run, validate, and decide.

pcq gives agents stable machine-readable surfaces while leaving policy to the agent or service. The library reports facts: what ran, what changed, what passed, what failed, which artifacts exist, and how a candidate compares with its parent.

resolve inspect validate run --json run --jsonl validate-run describe-run compare-runs lineage apply-plan agent install mcp serve

Quickstart

Create a contract project and produce a run record.

uv add 'pcq[mcp]'
pcq init-experiment --style script --output ./my-exp --with-pyproject --agent claude
cd ./my-exp
uv sync
pcq agent install --target claude --path . --mcp
pcq run --json
pcq validate-run output --json
pcq describe-run output --json

FAQ

Common questions.

Does pcq replace PyTorch / HF Trainer / Lightning / sklearn?

No. pcq does not own the training loop. Your project keeps any framework. pcq only standardizes the surrounding evidence: cq.yaml, metric emission, artifact layout, validation, comparison, lineage, and the final run_record.json.

Is the CQ managed service required?

No. pcq is useful standalone — locally, in CI, in notebooks, and inside third-party orchestrators. CQ is one managed consumer of the same contract.

How does pcq integrate with Claude Code or Codex?

Install pcq[mcp], run pcq agent install --target claude --path . --mcp to write .mcp.json, and start pcq mcp serve. The agent then sees 14 mcp__pcq__* tools and calls them with structured arguments instead of parsing stdout.

What artifacts does a run produce?

config.json, metrics.json, manifest.json, run_summary.json, run_record.json, and validation_report.json. run_record.json is the canonical completion object.

How does an agent decide whether a run passed?

Use pcq validate-run output --strictness 3 --json for pass/warn/fail facts and pcq describe-run output --json for decision facts. pcq deliberately reports facts; the agent or service chooses policy.

Can I emit live events from long training runs?

Yes. pcq run --path . --jsonl streams newline-delimited events (run.started, stdout, metric, run.completed, run.failed). Add --events output/events.jsonl to persist the same evidence to a file.

What strictness levels exist for validation?

Strictness 0–3. Lower levels warn, higher levels fail. Use 2 for project validation during authoring and 3 for run validation during gating. See docs/STRICTNESS.md.

Where is the pcq Python and JSON contract specified?

See docs/JSON_CONTRACTS.md and docs/CQ_YAML_RUNTIME_CONTRACT.md. Both are frozen in v2.13 onwards.

Relationship with CQ

pcq is the open contract. CQ is one managed consumer.

pcq

Open-source authoring, validation, artifact, and run-evidence library. Apache-2.0. Useful standalone.

CQ

Managed execution, queueing, artifact collection, dashboards, and agent loops. Consumes the same contract.