Metadata-Version: 2.4
Name: rawctx
Version: 0.3.28
Summary: rawctx CLI and SDK for semantic packages, answer audit evidence, OTel trace ingest, and trust proofs
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: click>=8.1.7
Requires-Dist: httpx>=0.28.1
Requires-Dist: pydantic>=2.11.0
Requires-Dist: PyYAML>=6.0.2
Requires-Dist: jsonschema>=4.23.0
Requires-Dist: ruamel.yaml>=0.18.6

# rawctx CLI

Python CLI and SDK for rawctx Hub. rawctx records AI answer evidence with
approved package refs, context hashes, evidence `source_ref`s, correction
history, OpenTelemetry (OTel) trace-bundle ingest, and trust proof status.
Teams can show what an answer used, whether the proof is externally anchored or
still pending, and how the evidence changed over time without replacing their
existing observability stack.

Guides:

- `../docs/guides/package-workflow.md`
- `../docs/guides/metricflow-native-workflow.md`
- `../docs/diff.md`
- Answer Audit docs: https://hub.rawctx.dev/docs/answer-audit
- Trust proof foundation: https://github.com/pasar6987-create/rawctx/blob/main/docs/spec/trust-proof-foundation.md

OTel support:

- `rawctx.ingest_otel_trace_bundle()` records submitted OpenTelemetry GenAI
  trace bundles and ties them to approved semantic package definitions.
- OTel is used as an evidence input on top of your runtime logs; rawctx does not
  claim the upstream trace is ground truth by itself.

## Commands

User:

- `rawctx login [--registry URL] [--id-token JWT] [--token-name NAME] [--expires-in-days N] [--no-browser] [--json]`
- `rawctx logout [--local-only] [--json]`
- `rawctx search [QUERY] [--format F] [--source-format F] [--origin all|native|indexed] [--domain D] [--source S] [--tags CSV] [--sort similarity|recent|name] [--page N] [--size N] [--json] [--offline] [--registry URL]`
- `rawctx info PACKAGE_REF [--json] [--offline] [--registry URL]`
- `rawctx download PACKAGE_REF MODEL_PATH [--local-dir DIR] [--stdout] [--offline] [--force] [--json] [--registry URL]`
- `rawctx snapshot-download PACKAGE_REF [--local-dir DIR] [--offline] [--force] [--json] [--registry URL]`
- `rawctx to-prompt PACKAGE_REF [--datasets CSV] [--max-tokens N] [--offline] [--mode agent_context|strict_metric] [--metrics CSV] [--question TEXT] [--budget-policy compact|error_if_required_missing] [--render-format text|xml] [--json] [--registry URL]`
- `rawctx validate [TARGET] [--format auto|manifest|osi] [--show-dataset-measures] [--json]`
- `rawctx pack [TARGET_DIR] [--output-dir DIR] [--json]`
- `rawctx convert --from metricflow --to osi INPUT_PATH --output DIR [--package-name @scope/name] [--package-version X.Y.Z] [--overwrite] [--json]`
- `rawctx publish [TARGET_DIR] [--private] [--org ORG] [--registry URL]`
- `rawctx publish --from-dbt DBT_PROJECT_DIR [--native] [--emit-package DIR] [--package-name @scope/name] [--package-version X.Y.Z] [--private] [--org ORG] [--registry URL]`
- `rawctx diff A B [--format text|json|github|markdown|junit] [--consumer sql|python|llm|all] [--severity breaking|behavioral|cosmetic|all] [--exit-code-on breaking|behavioral|none] [--max-tokens N] [--output PATH]`
- `rawctx diff semantic A B [--format text|json|github|markdown|junit] [--consumer sql|python|llm|all] [--severity breaking|behavioral|cosmetic|all]`
- `rawctx diff prompt A B [--format text|json|github|markdown|junit] [--max-tokens N]`
- `rawctx diff eval A B --questions FILE [--format text|json|github|markdown|junit] [--runs N] [--model NAME]`
- `rawctx trust status [--json] [--registry URL]`
- `rawctx trust policy [--json] [--registry URL]`
- `rawctx trust anchor run [--force] [--json] [--registry URL]`
- `rawctx trust proof answer LOG_ID [--output proof.json] [--json] [--registry URL]`
- `rawctx trust verify proof.json [--online] [--json] [--registry URL]`

Maintainer:

- `rawctx claim PACKAGE_REF [--json] [--registry URL]`

Ops:

- `rawctx index dbt --seed-file PATH [--only owner/name] [--limit N] [--dry-run] [--json] [--registry URL]`
- `rawctx index git --repo owner/name --source-ref REF --package-version X.Y.Z [--package-name NAME] [--scope SCOPE] [--model-glob GLOB ...] [--dry-run] [--json] [--registry URL]`

## Supported Package Lanes

rawctx currently supports two published package formats:

- `format=osi`: packaged OSI YAML files
- `format=metricflow`: native MetricFlow/dbt snapshot packages

Both lanes support:

- `rawctx info`
- `rawctx snapshot-download`
- `rawctx.load()`
- `rawctx.to_prompt()`
- `rawctx diff`

`download PACKAGE_REF MODEL_PATH` also works for both lanes, but only for files listed in `manifest.models`.

`rawctx diff` accepts three artifact inputs:

- `@scope/name@version`
- local package directories
- `.rawctx.tar.gz` archives

It compares artifacts only. It never queries a warehouse.

## Answer Audit Evidence

The Python SDK can register reusable evidence first, then record one audit shell
per application answer that cites approved semantic references, external trace
ids, evidence `source_ref`s, and later correction, void, or redaction events.

Answer logs are hash-only by default for raw question and answer text. Tenant
settings can opt in to raw text storage, but `question_hash` and `answer_hash`
remain available either way. P3 evidence APIs add a separate evidence path for
text, audio, and video assets, sanitized segments, runtime media stream
retrieval, short-lived auditor downloads, Text Gate alpha retrieval,
tamper-evident ledger verification, and OpenTelemetry (OTel) trace-bundle
ingest.
Reference audio and video evidence should be registered before the answer log,
then retrieved at answer time and cited through the returned `source_ref`.

Hub web follows the same reference-first model. Tenant managers register
audio/video in private workspace settings, copy the `source_ref`, and pass it in
`source_refs` when creating the answer log. The Media evidence vault is not
exposed from the Public Hub navigation or public settings routes. Web uploads
are capped at 5 MB; direct SDK/API registration can use the backend evidence
limits. Runtime media retrieval always requires a `purpose`, records an access
event, and returns a short-lived `stream_url` for the agent to read before
logging the answer. Manual auditor retrieval remains available through a
separate download API. Download filenames are emitted with a safe ASCII fallback
plus UTF-8 `filename*` so non-ASCII filenames work with S3 presigned downloads.

```python
from pathlib import Path

import rawctx

media = rawctx.register_media_evidence_asset(
    filename="support-call.wav",
    mime_type="audio/wav",
    asset_type="audio",
    content=Path("support-call.wav").read_bytes(),
    metadata={"case_id": "case-123"},
    registry="https://api.rawctx.dev",
)

retrieval = rawctx.retrieve_media_evidence(
    media["evidence_asset_id"],
    purpose="answer_generation",
    external_trace_id="req_123",
    external_message_id="msg_456",
    registry="https://api.rawctx.dev",
)
# Stream retrieval["stream_url"] into the agent before it expires.

log = rawctx.log_answer(
    application_key="analytics_bot",
    environment="production",
    idempotency_key="analytics_bot:req_123:msg_456",
    external_trace_id="req_123",
    question_text="Which plan drove expansion MRR?",
    answer_text="The Team plan drove the largest expansion.",
    semantic_refs=[
        {
            "package_ref": "@acme/revenue-metrics",
            "package_version": "1.2.0",
            "context_hash": "sha256:...",
            "metrics": ["mrr"],
        }
    ],
    source_refs=[retrieval["source_ref"]],
    evidence_access_event_ids=[retrieval["access_event_id"]],
    policy_flags={"approved_definition_only": True},
    registry="https://api.rawctx.dev",
)

media_assets = rawctx.list_media_evidence_assets(asset_type="audio", registry="https://api.rawctx.dev")
download = rawctx.request_media_evidence_asset_download(
    media["evidence_asset_id"],
    purpose="auditor_media_review",
    registry="https://api.rawctx.dev",
)
# Use download["download_url"] before it expires; rawctx records the access event.

supplemental = rawctx.list_answer_evidence_assets(log["id"], registry="https://api.rawctx.dev")
segments = rawctx.list_answer_segments(log["id"], registry="https://api.rawctx.dev")
retrieved = rawctx.retrieve_text_gate_alpha(
    "expansion MRR evidence",
    application_key="analytics_bot",
    include_hash_only=True,
    registry="https://api.rawctx.dev",
)
otel_log = rawctx.ingest_otel_trace_bundle(
    application_key="analytics_bot",
    external_trace_id="req_123",
    trace_bundle={"resourceSpans": []},
    semantic_refs=[{"package_ref": "@acme/revenue-metrics", "package_version": "1.2.0"}],
    registry="https://api.rawctx.dev",
)
```

Use `RawctxClient` or `AsyncRawctxClient` when a service should share one
registry, token, and timeout across answer audit calls:

- `create_answer_log()`, also exposed as top-level `log_answer()`
- `register_media_evidence_asset()`
- `list_media_evidence_assets()`
- `retrieve_media_evidence()` for runtime agent stream access
- `request_media_evidence_asset_download()`
- `request_answer_evidence_asset_upload()` for log-scoped supplemental assets
- `register_answer_evidence_asset()` for log-scoped supplemental assets
- `list_answer_evidence_assets()` for sanitized answer detail evidence
- `request_answer_evidence_asset_download()` for supplemental original retrieval
- `list_answer_segments()`
- `ingest_otel_trace_bundle()`
- `retrieve_text_gate_alpha()`
- `append_answer_log_event()`
- `export_answer_logs()`
- `trust_status()`
- `trust_policy()`
- `run_trust_anchor()`
- `proof_answer()`
- `verify_proof_bundle_online()`

OpenTelemetry support is intentionally on-top: rawctx records the submitted
trace bundle and binds it to approved definitions without claiming that the
upstream runtime trace itself is ground truth.

## Trust Proofs / 신뢰 증명

rawctx trust proofs are separate from the answer payload. They show whether the
answer audit record is only locally signed, waiting for external confirmation,
or externally anchored.

- `LOCAL_ONLY`: the record has a Merkle leaf, signed tree head, and local
  receipt, but no independent external anchor yet.
- `PENDING`: the proof structure and signatures are valid, but OpenTimestamps
  Bitcoin attestation or witness policy completion is still outstanding.
- `BITCOIN_OBSERVED_PENDING_CONFIRMATION`: the OpenTimestamps proof contains a
  Bitcoin block-header attestation, but rawctx is not yet calling it finalized.
- `ANCHORED`: a confirmed external anchor is present and the required witness
  condition is satisfied.
- `INVALID`: a structural, hash, signature, anchor, or witness check failed.

Production rawctx can combine:

- Merkle inclusion proof for the answer audit leaf
- AWS KMS-signed STH/checkpoint
- OpenTimestamps hash-only Bitcoin anchoring
- Sigstore Rekor witness receipts after SET, inclusion proof, signed checkpoint,
  log ID, and entry body verification
- receipt bundle retention in S3 Object Lock when configured

OpenTimestamps and Rekor do not require a customer account. OpenTimestamps may
remain pending until a Bitcoin attestation is available. A Bitcoin-observed
proof is stronger than a calendar-pending proof, but it is reported separately
until the final confirmation policy is satisfied. Rekor is a public transparency
log witness observation; it is not a Bitcoin anchor. S3 Object Lock is a
preservation layer for receipt bundles, not the source of independent
corroboration.

Korean summary:

- 신뢰 증명은 답변 원문이 아니라 답변 감사 로그의 leaf, STH 서명, anchor
  receipt, witness receipt를 검증하기 위한 번들입니다.
- `PENDING`은 실패가 아닙니다. 구조와 서명은 검증 가능하지만
  OpenTimestamps 비트코인 attestation 또는 witness 정책 완료를 기다리는
  상태입니다.
- `BITCOIN_OBSERVED_PENDING_CONFIRMATION`은 Bitcoin block header 관측은
  들어왔지만 최종 확정으로 표현하지 않는 중간 상태입니다.
- Rekor는 공개 transparency log에 관측된 witness receipt이고, OpenTimestamps는
  체크포인트 해시를 비트코인에 앵커하기 위한 경로입니다.
- S3 Object Lock은 receipt bundle 보존 계층입니다. 독립 확인은 외부 앵커와
  witness에서 나옵니다.

```bash
rawctx trust status --json
rawctx trust proof answer "$ANSWER_LOG_ID" --output proof.json
rawctx trust verify proof.json --online --json
```

```python
import rawctx

answer_log_id = "anslog_..."

with rawctx.RawctxClient(registry="https://api.rawctx.dev") as client:
    proof = client.proof_answer(answer_log_id)
    online = client.verify_proof_bundle_online(proof)
    latest = client.trust_status()

print(online["trust_status"])
print(latest["external_anchor_status"])
```

## Package Refs and `latest`

Package refs can be exact or pointer-based:

- `@scope/name@1.2.3` pins one immutable published version
- `@scope/name@latest` asks the registry for the workspace-approved latest version
- `@scope/name` behaves like `@scope/name@latest` for download, load, and prompt workflows

When the registry returns resolution metadata, rawctx preserves the requested ref, resolved concrete version, and
snapshot SHA-256 in the JSON-shaped response or prompt context. Use exact pins in CI or release automation when a job
must be independent of future latest promotions.

## Compare Packages

Use `rawctx diff` when you need semantic-level change review instead of raw file diffs.

```bash
rawctx diff ./pkg-v1 ./pkg-v2
rawctx diff semantic ./pkg-v1 ./pkg-v2 --format json
rawctx diff prompt ./pkg-v1 ./pkg-v2 --max-tokens 2000
rawctx diff eval ./pkg-v1 ./pkg-v2 --questions questions.jsonl --runs 5 --model mock
```

The top-level command runs `semantic` + `prompt`. `eval` stays opt-in because it measures model behavior, not deterministic package structure.

## Notebook / Code

Search uses the public Hub index first so CLI and SDK results match the logged-out web experience. If a search returns no public matches and you have a token configured, rawctx retries with authenticated search.

Notebook shell style:

```python
!rawctx search "semantic model" --sort similarity --json
!rawctx info @scope/name --json
!rawctx snapshot-download @scope/name --json
!rawctx download @scope/name models/customers.yml --json
!rawctx to-prompt @scope/name --datasets customers,order_item --max-tokens 2000
!rawctx validate ./my-package --json
```

Python API:

```python
import rawctx

result = rawctx.search("semantic model", registry="https://api.rawctx.dev", sort="similarity")
pkg = rawctx.info("@scope/name", registry="https://api.rawctx.dev")
model = rawctx.load("@scope/name", registry="https://api.rawctx.dev")
prompt = rawctx.to_prompt(
    "@scope/name",
    datasets=["customers", "order_item"],
    max_tokens=2000,
    registry="https://api.rawctx.dev",
)

print(model.format_name)    # "osi" or "metricflow"
print(model.datasets)       # normalized dataset names
print(model.measures)       # [Measure(name="...", ...)]
print(model.dimensions)     # [Dimension(name="...", ...)]
print(model.relationships)  # [Relationship(name="...", ...)]
print(prompt)
print(pkg["model_paths"])

snapshot_dir = rawctx.snapshot_download("@scope/name", registry="https://api.rawctx.dev")
model_path = rawctx.download("@scope/name", "models/customers.yml", registry="https://api.rawctx.dev")
validation = rawctx.validate("./my-package")
semantic = rawctx.semantic_diff("./pkg-v1", "./pkg-v2")
prompt_diff = rawctx.prompt_diff("./pkg-v1", "./pkg-v2", max_tokens=2000)
combined = rawctx.diff_artifacts("./pkg-v1", "./pkg-v2")
```

Async Python API:

```python
import asyncio
import rawctx

async def main():
    async with rawctx.AsyncRawctxClient(registry="https://api.rawctx.dev") as client:
        result = await client.search("semantic model", sort="similarity")
        model = await client.load("@scope/name")
        prompt = await client.to_prompt("@scope/name", datasets=["customers", "order_item"])
        snapshot_dir = await client.snapshot_download("@scope/name")
        diff_report = await client.diff("./pkg-v1", "./pkg-v2")
        return result, model, prompt, snapshot_dir, diff_report

asyncio.run(main())
```

## `to_prompt()` Behavior

`rawctx.to_prompt()` turns a package snapshot into compact LLM context. It uses the same normalized semantic objects as `load()`, then applies package metadata, dataset filters, and prompt budget settings to render agent-ready text.

The same prompt compiler is available from the CLI:

```bash
rawctx to-prompt @scope/name --datasets customers,order_item --max-tokens 2000
rawctx to-prompt @scope/name --mode strict_metric --metrics mrr --render-format xml
rawctx to-prompt @scope/name --json
```

The rendered prompt keeps a predictable section shape:

```text
Domain: {domain} ({package_name})

Models:
...

Datasets:
...

Metrics:
...

Relationships:
...
```

Dataset filters preserve the requested order, drop duplicates, and fail with `UsageError: Unknown dataset(s): ...` when a requested dataset is not present. Selecting a subset keeps context focused on the requested datasets and their relevant relationships.

`max_tokens` is a practical size target, not a model-specific tokenizer guarantee. When the budget is tight, rawctx prioritizes high-signal semantic context and compacts lower-priority detail. Use `return_context=True` when you need the selected objects, estimated size, render hash, omissions, and warnings for logging or review.

## Download Behavior

- `download` fetches one file listed in `manifest.models`
- `snapshot-download` materializes the full extracted package tree
- for native MetricFlow packages, `snapshot-download` is the primary handoff because it restores the full dbt-style snapshot
- `load()` and `to_prompt()` normalize both OSI and native MetricFlow packages into the same typed Python structures
- when using `snapshot-download --local-dir`, prefer a new or empty directory. `--force` only replaces an existing rawctx snapshot directory and refuses to wipe the current working directory or unrelated folders
- `indexed` packages remain preview-only and cannot be downloaded directly

## Validate / Pack / Publish

`validate`, `pack`, and `publish` all start from a local package directory.

- `validate`: checks the manifest and validates the package according to `manifest.format`
- `pack`: builds a deterministic local `.rawctx.tar.gz`
- `publish`: validates again, rebuilds a temporary archive, calculates the checksum, uploads bytes, and completes the version

Published versions are immutable release artifacts. Private workspaces can optionally require approval before a
published version is promoted to `latest`. When that governance policy is enabled, `publish` still creates the version,
but `latest` moves only after the request is reviewed and approved in rawctx Hub. When governance is disabled, direct
latest promotion keeps the existing lightweight behavior.

Package directories are no longer OSI-only.

OSI package example:

```text
my-osi-package/
  rawctx.yaml
  README.md
  models/
    sales_summary.osi.yaml
    customers.osi.yaml
```

Native MetricFlow package example:

```text
my-metricflow-package/
  rawctx.yaml
  README.md
  dbt_project.yml
  models/
    customers.yml
    orders.yml
```

Native MetricFlow manifest example:

```yaml
name: "@demo/jaffle-metrics"
version: "1.0.0"
format: "metricflow"
source_format: "metricflow"
description: "Native MetricFlow package"
models:
  - models/customers.yml
  - models/orders.yml
include:
  - dbt_project.yml
repository: "https://github.com/dbt-labs/jaffle-sl-template"
```

Notes:

- `format` supports `osi` and `metricflow`
- `models` must stay relative and must resolve inside the package directory
- `include` is optional and is mainly useful for native packages that need extra project files such as `dbt_project.yml`
- standalone file validation is still limited to manifest files and OSI files, so `rawctx validate models/customers.yml` is not a native MetricFlow file validator by itself

## Convert Workflow

Inspect-first OSI flow:

```bash
rawctx convert --from metricflow --to osi ./my-dbt-project --output ./dist/pkg
rawctx validate ./dist/pkg --json
rawctx pack ./dist/pkg --output-dir ./dist --json
```

## Publish Directly From dbt

Convert to OSI and publish:

```bash
rawctx login
rawctx publish --from-dbt ./my-dbt-project --emit-package ./dist/pkg
```

Publish a native MetricFlow package:

```bash
rawctx login
rawctx publish --from-dbt ./my-dbt-project --native --emit-package ./dist/native-pkg
rawctx publish --from-dbt ./my-dbt-project --native --package-name @your-scope/jaffle-shop --package-version 1.2.3
```

Use `--emit-package` when you want the generated package directory to remain on disk after the publish run.

## Latest Promotion Governance

Governance is about changing the official pointer, not editing the artifact:

```text
published immutable version
        |
request latest promotion in rawctx Hub
        |
review diff or prompt preview when required
        |
approval threshold reached
        |
latest resolves to that concrete version
```

Workspace admins can enable approval before latest promotion, set the required approval count, choose whether requesters
may self-approve, and require semantic diff review. The first governance surface is in the authenticated Hub UI:
workspace settings configure the policy, and package version pages create, approve, reject, or cancel latest promotion
requests.

CLI and Python consumers do not need a separate governance command to use the result. They keep using exact refs or the
approved latest pointer:

```bash
rawctx snapshot-download @scope/name@1.2.3
rawctx to-prompt @scope/name@latest --max-tokens 1200
```

If current latest changes while a request is pending, rawctx marks that request stale instead of moving latest from an
unexpected base version. Existing pending requests keep the approval threshold captured when the request was created.

## Auth Flow (Auto + Fallback)

1. Run `rawctx login`.
2. CLI opens or prints the OAuth URL from `POST /api/auth/login` and falls back to the legacy GitHub endpoint if needed.
3. Complete login in the browser.
4. CLI automatically polls OAuth session status and captures `id_token` when the registry supports it.
5. CLI calls `POST /api/auth/token` and stores the API token in `~/.rawctx/config.yaml`.

Manual fallback:

- `rawctx login --id-token '<JWT>'`

## Config and Environment

Config file (default): `~/.rawctx/config.yaml`

```yaml
registry: "https://api.rawctx.dev"
auth:
  token: "rxctx_..."
  token_id: "uuid"
  token_name: "rawctx-cli"
  issued_at: "2026-02-28T00:00:00+00:00"
profile:
  username: "owner"
```

Environment overrides:

- `RAWCTX_CONFIG` (config path)
- `RAWCTX_REGISTRY` (registry URL)
- `RAWCTX_TOKEN` (auth token)

Priority: CLI option > env var > config > default.

## Offline Mode

`--offline` is supported for:

- `search`
- `info`
- `download`
- `snapshot-download`

Cache paths:

- index: `~/.rawctx/cache/packages.json`
- archives: `~/.rawctx/cache/archives/@scope/name/<version>.rawctx.tar.gz`
- snapshots: `~/.rawctx/packages/@scope/name/<version>/`
