Metadata-Version: 2.4
Name: coding-cli-runtime
Version: 0.8.3
Summary: Reusable CLI runtime primitives for provider-backed automation workflows
Author-email: LLM Eval maintainers <llm-eval-maintainers@users.noreply.github.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/pj-ms/llm-eval/tree/main/packages/coding-cli-runtime
Project-URL: Repository, https://github.com/pj-ms/llm-eval
Project-URL: Issues, https://github.com/pj-ms/llm-eval/issues
Project-URL: Changelog, https://github.com/pj-ms/llm-eval/blob/main/packages/coding-cli-runtime/CHANGELOG.md
Keywords: cli,runtime,llm,automation,schema-validation
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: dev
Requires-Dist: build>=1.2.2; extra == "dev"
Requires-Dist: bump-my-version>=0.26.0; extra == "dev"
Requires-Dist: mypy>=1.13.0; extra == "dev"
Requires-Dist: pre-commit>=4.0.0; extra == "dev"
Requires-Dist: pytest==8.3.3; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: ruff>=0.8.0; extra == "dev"
Requires-Dist: setuptools>=77.0.3; extra == "dev"
Requires-Dist: twine>=5.1.1; extra == "dev"
Requires-Dist: wheel; extra == "dev"
Dynamic: license-file

# coding-cli-runtime

[![PyPI](https://img.shields.io/pypi/v/coding-cli-runtime)](https://pypi.org/project/coding-cli-runtime/)
[![Python](https://img.shields.io/pypi/pyversions/coding-cli-runtime)](https://pypi.org/project/coding-cli-runtime/)
[![Build](https://github.com/pj-ms/llm-eval/actions/workflows/ci.yml/badge.svg)](https://github.com/pj-ms/llm-eval/actions/workflows/ci.yml)
[![License](https://img.shields.io/pypi/l/coding-cli-runtime)](LICENSE)

A Python library for orchestrating LLM coding agent CLIs — [Claude Code](https://docs.anthropic.com/en/docs/claude-code), [Codex](https://github.com/openai/codex), [Gemini CLI](https://github.com/google-gemini/gemini-cli), and [GitHub Copilot](https://docs.github.com/en/copilot).

These CLIs each have different invocation patterns, output formats, error
shapes, and timeout behaviors. This library normalizes all of that behind
a common `CliRunRequest` → `CliRunResult` contract, so your automation
code doesn't need provider-specific subprocess handling.

The package now exposes a stable core API plus preview provider adapters:

- `coding_cli_runtime` — stable metadata, launch primitives, schema helpers,
  subprocess/session execution, and provider facts.
- `coding_cli_runtime.providers` — preview provider-aware adapters that own
  run/parse/session/recovery flows for Claude, Codex, Copilot, and Gemini.

If you are inspecting only the root package, note that the preview adapter
namespace is available as `coding_cli_runtime.providers`.

Maintainers changing provider/runtime contracts should start with:

- [`docs/PROVIDER_CONTRACT_API_PLAN.md`](https://github.com/pj-ms/llm-eval/blob/main/packages/coding-cli-runtime/docs/PROVIDER_CONTRACT_API_PLAN.md)
- [`docs/README.md`](https://github.com/pj-ms/llm-eval/blob/main/packages/coding-cli-runtime/docs/README.md)
- [`playground/README.md`](https://github.com/pj-ms/llm-eval/blob/main/packages/coding-cli-runtime/playground/README.md)

**What it does (and why not just `subprocess.run`):**

- Run any provider CLI with unified request/result types and timeout enforcement
- Query the model catalog (with user-override and live-cache fallback)
- Classify failures as retryable vs fatal per provider
- Look up provider auth, config dirs, and headless launch flags
- Build non-interactive launch commands without hardcoding provider flags
- Find session logs and preserved conversations after a run
- Run long-lived sessions with process-group cleanup and transcript mirroring
- Preview provider-aware launch/result flows with typed adapter events and
  normalized raw execution views
- No Python package dependencies — only requires the provider CLIs themselves

## Installation

```bash
pip install coding-cli-runtime
# or
uv add coding-cli-runtime
```

Requires Python 3.10+.

## Agent Companion Assets (Preview)

`coding-cli-runtime` now ships a small package-owned companion bundle for
agents and humans consuming the package from other repositories. The bundle is
package-generic: it explains which surface to use, how to interpret
`artifact`/`response`/`conversation_log`/`raw_output_artifact`, and where to
look next when provider behavior drifts.

Inspect the packaged manifest:

```bash
python -m coding_cli_runtime.companion_assets show-manifest
```

Export the package-neutral bundle:

```bash
python -m coding_cli_runtime.companion_assets export \
  --dest ./coding-cli-runtime-companion
```

Install the derived Codex skill:

```bash
python -m coding_cli_runtime.companion_assets install-codex-skill
```

By default, the Codex skill installs into `$CODEX_HOME/skills` or
`~/.codex/skills` when `CODEX_HOME` is unset. The package-neutral bundle
remains the source of truth; the Codex skill is a thin exported wrapper.

## Examples

### Execute a provider CLI

```python
import asyncio
from pathlib import Path
from coding_cli_runtime import CliRunRequest, run_cli_command

request = CliRunRequest(
    cmd_parts=("codex", "--model", "gpt-5.4", "--quiet", "exec", "fix the tests"),
    cwd=Path("/tmp/my-project"),
    timeout_seconds=120,
)
result = asyncio.run(run_cli_command(request))

print(result.returncode)        # 0
print(result.error_code)        # "none"
print(result.duration_seconds)  # 14.2
print(result.stdout_text[:200])
```

Swap `codex` for `claude`, `gemini`, or `copilot` — the request/result
shape stays the same. A synchronous variant `run_cli_command_sync` is also
available.

### Pick a model from the provider catalog

```python
from coding_cli_runtime import get_provider_spec

codex = get_provider_spec("codex")
print(codex.default_model)   # e.g. "gpt-5.4" from the current local catalog
print(codex.model_source)    # "codex_cli_cache", "override", or "code"

for model in codex.models:
    print(f"  {model.name}: {model.description}")
```

The catalog covers all four providers — each with model names, reasoning
levels, default settings, and visibility flags.

Model lists are resolved with a three-tier fallback:

1. **User override** — drop a JSON file at
   `~/.config/coding-cli-runtime/providers/<provider>.json` to use your own
   model list immediately, without waiting for a package update.
2. **Live CLI cache** — for Codex, the library reads
   `~/.codex/models_cache.json` (auto-refreshed by the Codex CLI) when
   present. Other providers fall through because their CLIs don't expose a
   machine-readable model list.
3. **Hardcoded fallback** — the model list shipped with the package.

Override file format:

```json
{
  "default_model": "claude-sonnet-4-7",
  "models": [
    "claude-sonnet-4-7",
    {
      "name": "claude-opus-5",
      "description": "Latest opus model",
      "controls": [
        { "name": "effort", "kind": "choice", "choices": ["low", "high"], "default": "low" }
      ]
    }
  ]
}
```

Set `CODING_CLI_RUNTIME_CONFIG_DIR` to change the config directory
(default: `~/.config/coding-cli-runtime`).

### Decide whether to retry a failed run

```python
from coding_cli_runtime import classify_provider_failure

classification = classify_provider_failure(
    provider="gemini",
    stderr_text="429 Resource exhausted: rate limit exceeded",
)

if classification.retryable:
    print(f"Retryable ({classification.category}) — will retry")
else:
    print(f"Fatal ({classification.category}) — giving up")
```

Works for all four providers. Recognizes auth failures, rate limits,
network transients, and other provider-specific error patterns.

### Use preview provider-aware adapters

```python
from pathlib import Path

from coding_cli_runtime.providers import claude

request = claude.ClaudeExecRequest(
    model="claude-sonnet-4-6",
    prompt="Summarize the repository status as JSON.",
    cwd=Path("/tmp/my-project"),
    output_format="json",
    transcript_path=Path("/tmp/claude-conversation.jsonl"),
)

preview = claude.prepare_launch(request)
print(preview.display_text)

result = claude.run_sync(request)
print(result.raw_execution.returncode)
print(result.parsed_output.structured_output)

session = claude.find_session(
    request.cwd,
    result.raw_execution.started_at,
    prompt_text=request.prompt,
)
conversation = claude.get_conversation(session)
print(conversation.line_count)
```

These preview adapters are provider-aware and provider-specific on purpose.
They are the right API when you want package-owned parsing, session lookup,
conversation retrieval, launch preview, adapter events, and provider-specific
recovery behavior.

These `coding_cli_runtime.providers.*` APIs are still preview surfaces and may
evolve faster than the stable core metadata/helpers.

The stable root package also exposes low-level `find_claude_session(...)` and
`find_codex_session(...)` helpers that return a best-effort session log path.
Use `coding_cli_runtime.providers.find_*_session(...)` when you want the
preview adapter layer's typed lookup results and provider-specific matching
behavior.

For a concrete Copilot local image-evaluation recipe, see
[`copilot_structured_multimodal.md`](https://github.com/pj-ms/llm-eval/blob/main/packages/coding-cli-runtime/docs/examples/copilot_structured_multimodal.md).

#### Continue a previous CLI session

```python
from pathlib import Path

from coding_cli_runtime.providers import ProviderContinuationMode, claude

workdir = Path("/tmp/my-project")

first = claude.run_sync(
    claude.ClaudeExecRequest(
        model="claude-sonnet-4-6",
        prompt="Reply with exactly FIRST.",
        cwd=workdir,
        output_format="text",
    )
)

resumed = claude.run_sync(
    claude.ClaudeExecRequest(
        model="claude-sonnet-4-6",
        prompt="Reply with exactly SECOND.",
        cwd=workdir,
        output_format="text",
        continuation_mode=ProviderContinuationMode.SESSION_ID,
        session_id=first.session_lookup.session_id,
    )
)

print(resumed.continuation_resolution.resolved_session_id)
```

Preview adapters support explicit multi-turn continuation via
`continue_latest`, `session_id`, or `session_path` where the underlying CLI
supports it. For Codex, fresh runs may use native `output_schema_path`
enforcement, while resumed runs intentionally reject `output_schema_path`
because `codex exec resume` does not support `--output-schema`.

### Common integration tasks

#### Check whether a provider CLI is installed

```python
from coding_cli_runtime import is_provider_installed

if not is_provider_installed("claude"):
    raise RuntimeError("Claude Code is not available on PATH")
```

This is intentionally minimal: it checks whether the provider binary exists on
PATH. Deeper CLI drift validation belongs in maintainer tooling, not the
runtime API.

#### Resolve workspace env vars and session search paths

```python
from coding_cli_runtime import (
    get_provider_contract,
    resolve_session_search_paths,
    resolve_workspace_env,
)

gemini = get_provider_contract("gemini")

# Derive provider-specific workspace env vars from contract metadata
env = resolve_workspace_env(gemini, "/tmp/run-dir")
# {"GEMINI_CLI_IDE_WORKSPACE_PATH": "/tmp/run-dir"}

# Expand concrete host paths for session log searches
paths = resolve_session_search_paths(gemini)
# (Path.home() / ".gemini" / "tmp",)
```

Use these helpers when you want the contract facts turned into concrete
filesystem/env values without rebuilding the same glue logic in your code.

### Look up provider contract metadata

```python
from coding_cli_runtime import get_provider_contract, build_env_overlay, resolve_config_paths, render_prompt

# Get structured metadata for any supported provider
contract = get_provider_contract("claude")
print(contract.binary)                        # "claude"
print(contract.auth.api_key_env_var)          # "CLAUDE_API_KEY"
print(contract.paths.config_dir)              # "~/.claude"
print(contract.headless.approval.flag)        # "--dangerously-skip-permissions"

# Build env var overlay for subprocess
env = build_env_overlay(contract, api_key="sk-...", base_url="https://custom.example.com")
# {"CLAUDE_API_KEY": "sk-...", "ANTHROPIC_BASE_URL": "https://custom.example.com"}

# Resolve config paths for container mounts
host_dir, container_dir = resolve_config_paths(contract, containerized=True)
# ("/home/user/.claude", "/root/.claude")

# Resolve prompt delivery (stdin vs flag vs activation)
payload = render_prompt(contract.headless.prompt, "Fix the bug")
# payload.args = ()            (stdin delivery for Claude)
# payload.stdin_text = "Fix the bug"
```

`ProviderContract` is structured as nested sub-contracts
(`AuthContract`, `PathContract`, `HeadlessContract`, `OutputContract`,
`IoContract`, `SessionDiscoveryContract`, `DiagnosticsContract`) so callers
can drill into whichever aspect they need. This is reference metadata,
not a command-construction control plane — callers keep their own
command assembly and adopt contract fields selectively.

### Query provider I/O conventions

```python
from coding_cli_runtime import get_provider_contract

gemini = get_provider_contract("gemini")

# Workspace env vars with value semantics
for wev in gemini.io.workspace_env_vars:
    print(f"{wev.name} = {wev.value_source}")
    # GEMINI_CLI_IDE_WORKSPACE_PATH = execution_dir

# Session discovery (where session logs live)
sd = gemini.session_discovery
print(sd.session_roots)  # ("tmp",)
print(sd.session_glob)   # "*/chats/session-*.json"

# Output format support
codex = get_provider_contract("codex")
print(codex.output.output_path_flag)    # "-o"
print(codex.output.schema_path_flag)    # "--output-schema"

# Diagnostics (Copilot only)
copilot = get_provider_contract("copilot")
if copilot.diagnostics:
    print(copilot.diagnostics.log_glob)  # "logs/process-*.log"
```

`WorkspaceEnvVar.value_source` uses a closed vocabulary:
`"execution_dir"` or `"workspace_root"`.

### Build headless launch commands

```python
from coding_cli_runtime import build_claude_headless_core, build_codex_headless_core

# Claude: binary + --print + --permission-mode + --dangerously-skip-permissions + --model
cmd = build_claude_headless_core("claude-sonnet-4-6")
cmd.extend(["--output-format", "text", "--disallowedTools", "Bash,Task"])

# Codex: binary + exec + --full-auto + --sandbox + --skip-git-repo-check + --model
cmd = build_codex_headless_core("gpt-5.4", sandbox_mode="read-only")
cmd.extend(["-C", str(workdir)])
```

Headless core helpers emit the standard flags for non-interactive runs.
Consumers append app-specific tails (tool restrictions, output paths, etc.).

### Find session logs after a run

```python
import time
from coding_cli_runtime import find_codex_session, find_claude_session

# Find the most recent Codex session log for a given working directory
session = find_codex_session("/path/to/project", since_ts=time.time() - 300)
if session:
    print(f"Session log: {session}")  # ~/.codex/sessions/.../conversation.jsonl
```

Works for Codex and Claude. Scans provider config directories for session
files matching the working directory and time window.

### Derive provider-agnostic transcript facts

```python
from coding_cli_runtime import TranscriptFactsError, derive_transcript_facts

try:
    facts = derive_transcript_facts(
        provider="copilot",
        transcript_path="/tmp/copilot_conversation.jsonl",
    )
except TranscriptFactsError as exc:
    print(f"Transcript parse failed: {exc}")
else:
    print(facts.assistant_messages)
    print(facts.content_messages)
    print(facts.tool_execution_start_count)
    print(facts.final_content)
```

This keeps raw transcripts provider-native while exposing a small shared facts
shape that callers can reuse without depending on provider event schemas
directly.

Copilot has two runtime artifact surfaces: stdout is the direct response
payload, and `transcript_path` is structured JSONL copied from the provider
session log. Use the structured JSONL for `derive_transcript_facts(...)`.
Legacy `--share` Markdown exports may exist in old runs, but current runtime
execution does not request or parse them as assistant content.

For fresh transcript samples and the maintainer probe harness, see the
[transcript probe playground](https://github.com/pj-ms/llm-eval/tree/main/packages/coding-cli-runtime/playground/transcript-probes)
in the project repository.

## Key types

| Type | Purpose |
|------|---------|
| `CliRunRequest` | Command spec: cmd, cwd, env, timeout, stream paths |
| `CliRunResult` | Result: returncode, stdout/stderr, duration, error code |
| `ErrorCode` | `none` · `spawn_failed` · `timed_out` · `non_zero_exit` |
| `ProviderSpec` | Provider catalog entry with models, controls, defaults |
| `ProviderContract` | Structured provider CLI metadata (auth, paths, headless, I/O, sessions) |
| `WorkspaceEnvVar` | Env var with value-source semantics (`execution_dir`, `workspace_root`) |
| `FailureClassification` | Classified error with retryable flag and category |

### Run long-lived CLI sessions

For CLI runs that take minutes (e.g., full app generation), use
`run_interactive_session()` instead of `run_cli_command()`. It adds:

- Process-group cleanup (kills orphaned child processes on timeout)
- Transcript mirroring (streams CLI output to a file while the process runs)
- Automatic retries on transient failures

```python
from coding_cli_runtime import run_interactive_session

result = await run_interactive_session(
    cmd_parts=("claude", "--print", "--model", "claude-sonnet-4-6"),
    cwd=workdir,
    stdin_text=prompt,
    logger=logger,
    timeout_seconds=600,
)
```

Only `cmd_parts`, `cwd`, `stdin_text`, and `logger` are required.
Other parameters have sensible defaults.

## API summary

The full public API is listed in [`__init__.py`](src/coding_cli_runtime/__init__.py).
Key function groups:

| Group | Functions |
|-------|-----------|
| Execution | `run_cli_command`, `run_cli_command_sync`, `run_interactive_session` |
| Provider metadata | `get_provider_contract`, `get_provider_spec`, `list_provider_specs` |
| Contract helpers | `build_env_overlay`, `resolve_config_paths`, `render_prompt`, `resolve_auth`, `resolve_workspace_env`, `resolve_session_search_paths` |
| Headless launch | `build_claude_headless_core`, `build_codex_headless_core`, `build_copilot_headless_core`, `build_gemini_headless_core` |
| Codex batch | `build_codex_exec_spec` |
| Failure handling | `classify_provider_failure` |
| Installation check | `is_provider_installed` |
| Session logs | `find_codex_session`, `find_claude_session` |
| Schema | `load_schema`, `validate_payload` |
| Utilities | `redact_text`, `build_model_id`, `normalize_path_str` |

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup and quality checks.

## Prerequisites

This package does **not** bundle any CLI binaries or credentials. You must
install and authenticate the relevant provider CLI yourself before using the
execution helpers.

## Status

Pre-1.0. API may change between minor versions.

## License

MIT
