Metadata-Version: 2.4
Name: barx
Version: 1.0.0
Summary: Runtime intelligence layer for AI-era Python software: observe, verify, explain, and gate releases on evidence.
Author-email: Karthik Barma <golla.sat@northeastern.edu>
Maintainer-email: Karthik Barma <golla.sat@northeastern.edu>
License: MIT
Project-URL: Homepage, https://github.com/TheBarmaEffect/barx
Project-URL: Repository, https://github.com/TheBarmaEffect/barx
Project-URL: Documentation, https://github.com/TheBarmaEffect/barx/tree/main/docs
Project-URL: Changelog, https://github.com/TheBarmaEffect/barx/blob/main/CHANGELOG.md
Project-URL: Issues, https://github.com/TheBarmaEffect/barx/issues
Keywords: runtime,observability,adaptive,explainability,testing,release-gate,ai-agents
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Debuggers
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: tomli>=2.0; python_version < "3.11"
Provides-Extra: rich
Requires-Dist: rich>=13; extra == "rich"
Provides-Extra: api
Requires-Dist: httpx>=0.27; extra == "api"
Provides-Extra: test
Requires-Dist: pytest>=8; extra == "test"
Requires-Dist: pytest-cov>=5; extra == "test"
Requires-Dist: hypothesis>=6; extra == "test"
Requires-Dist: httpx>=0.27; extra == "test"
Requires-Dist: tomli>=2.0; python_version < "3.11" and extra == "test"
Provides-Extra: dev
Requires-Dist: pytest>=8; extra == "dev"
Requires-Dist: pytest-cov>=5; extra == "dev"
Requires-Dist: hypothesis>=6; extra == "dev"
Requires-Dist: ruff>=0.6; extra == "dev"
Requires-Dist: rich>=13; extra == "dev"
Requires-Dist: httpx>=0.27; extra == "dev"
Requires-Dist: build>=1.2; extra == "dev"
Requires-Dist: twine>=5; extra == "dev"
Requires-Dist: tomli>=2.0; python_version < "3.11" and extra == "dev"
Provides-Extra: all
Requires-Dist: rich>=13; extra == "all"
Requires-Dist: httpx>=0.27; extra == "all"
Dynamic: license-file

# Barx — Runtime Intelligence Layer for AI-Era Python Software

**Everything your code did, explained through evidence.**

Barx observes Python code while it runs, verifies behavior, explains
every runtime decision with evidence, audits what AI coding agents
change, and turns a run into a GREEN/AMBER/RED release verdict you can
defend — locally, with zero telemetry.

AI is changing how code gets written. Barx focuses on what comes next:
**runtime trust** — can this run ship, and what proves it?

> **Barx 1.0.0 is the first stable release of the Barx Runtime
> Intelligence Layer.** Barx has been rebuilt and repositioned: the 2025
> PyPI release (0.1.0, "Fast, CPU-only AI framework") was a different
> product and is fully retired — see the [changelog](CHANGELOG.md).
> Everything in the [claims registry](#what-works-today) is implemented
> and tested; the limits are documented in
> [What Barx is not](#what-barx-is-not) and [docs/CLAIMS.md](docs/CLAIMS.md).

![Barx Studio — the local evidence workspace showing a GREEN release verdict](docs/assets/studio-run-light.png)

*Barx Studio: the local evidence workspace (`barx studio`). A real
recorded run — GREEN verdict, score and confidence, evidence categories,
and the Evidence Spine. [Graphite Dark](docs/assets/studio-run-dark.png)
ships too.*

## What Barx helps answer

- **Can this run ship?** → `barx release-check` (GREEN / AMBER / RED with evidence)
- **What changed between two runs?** → `barx drift`
- **What failed, and where?** → the Risks view in Studio, the report's exceptions section
- **What did the coding agent do to my repo?** → `barx.AgentAudit`
- **What evidence supports this release?** → the HTML report and Barx Studio

## Quickstart

```bash
pip install barx            # core; add barx[api] for API testing
```

> Until the 1.0.0 wheel lands on PyPI (publishing is a deliberate manual
> step — [docs/publishing.md](docs/publishing.md)), install from source:
> `pip install git+https://github.com/TheBarmaEffect/barx`. The 0.1.0
> currently on PyPI is the retired 2025 package, not this product.

```python
import barx

seen = barx.Collection()                      # starts as a list
seen.extend(f"url-{i}" for i in range(250))

for i in range(2000):                         # workload turns lookup-heavy...
    seen.contains(f"url-{i % 250}")

print(seen.backend())              # -> "set" (switched, with evidence)
print(seen.explain())              # why, evidence, alternatives, confidence, rollback
```

Every decision is a structured event in `.barx/runs/<run_id>/events.jsonl`.
Nothing leaves your machine — no telemetry, no network calls.

```bash
barx trace your_script.py          # record runtime spans
barx verify .                      # behavioral checks + static risk scan
barx release-check                 # GREEN/AMBER/RED verdict (RED exits 1)
barx report --html report.html     # one self-contained evidence artifact
barx studio                        # local-only visual workspace at 127.0.0.1
```

## See it

| | |
|---|---|
| [![RED verdict with Fix First](docs/assets/studio-run-red.png)](docs/assets/studio-run-red.png) | [![Evidence Sheet in Graphite Dark](docs/assets/studio-evidence-sheet.png)](docs/assets/studio-evidence-sheet.png) |
| A RED run: the verdict, then **Fix first** — the exact blockers with recommendations. | The **Evidence Sheet**: explanation, limitations, raw JSON collapsed by default. |
| [![Release view with score dimensions](docs/assets/studio-release.png)](docs/assets/studio-release.png) | [![Self-contained HTML report](docs/assets/report.png)](docs/assets/report.png) |
| The Release view: score dimensions with stated weights and formulas. | The portable one-file HTML report — offline, no CDN, secret-redacted. |

Every image is a real screenshot of real recorded runs
(`scripts/make_showcase_runs.py` + `scripts/capture_screenshots.mjs`),
not a mockup. The "viewer python" shown in the capsule is the Studio
process's interpreter (3.14 on the capture machine) — a labeled viewer
fact; the library itself is tested on 3.10–3.12.

## Architecture

Barx is strictly layered around one rule — **no event, no product**.
Every feature writes structured events to an append-only JSONL store;
explanations, reports, scores, verdicts, and Studio are renderings of
those events, never recomputations.

```mermaid
flowchart LR
    M[Instrumentation:<br/>Trace · Verify · API · Guard ·<br/>Adaptive · AI Runtime · AgentAudit] -->|events| S[(events.jsonl<br/>per run)]
    S --> R[build_report] --> H[HTML / JSON report]
    R --> U[Barx Studio]
    S --> G[Score → ReleaseGate] -->|GREEN / AMBER / RED| C[CI · PR comment · exit code]
```

The full map — layer contracts, the event schema, Guard's single-patch
seam model, the privacy table, and extension points — is in
[docs/architecture.md](docs/architecture.md).

## Core concepts

- **Run** — one instrumented execution, stored under `.barx/runs/<id>/`.
- **Event** — a structured record with evidence; everything Barx shows is
  rendered from events, never invented.
- **Evidence** — the event ids behind every claim, score, and verdict.
- **Report** — `build_report` output, served as JSON or one self-contained
  HTML file. The portable artifact.
- **Studio** — a local-only viewer over that same report data.
- **ReleaseGate** — documented GREEN/AMBER/RED rules over the evidence.

## Main capabilities

- **Trace** — function spans, nesting, boundary exception capture; no
  argument values captured. ([docs](docs/trace.md))
- **Verify** — behavioral verification over your cases + a 20-rule AST
  risk scan (no code execution). ([docs](docs/verify.md))
- **API** — API testing with runtime evidence; auth/tokens redacted.
  Optional `barx[api]` extra. ([docs](docs/api.md))
- **Policy / Guard** — runtime guardrails (observe/warn/strict) via
  reversible patch seams. Not a sandbox. ([guard](docs/guard.md))
- **Drift / Replay** — compare two runs (comparative, not causal); replay
  GET-only and dry-run by default. ([drift](docs/drift.md), [replay](docs/replay.md))
- **Score / ReleaseGate** — evidence-backed score and verdict with stated
  formulas. ([score](docs/score.md), [gate](docs/release_gate.md))
- **Adaptive runtime** — Collection, Cache, Router, Pipeline: evidence-
  backed, explainable, overridable switching. ([collection](docs/collection.md))
- **AI runtime** — LLMTrace (prompts/responses hashed by default),
  PromptGuard (heuristic), Cost (estimates from your price table).
  ([llm](docs/llm_trace.md))
- **AgentAudit** — observable evidence of what an AI coding agent did to a
  repo. ([docs](docs/agent_audit.md))
- **Evidence Testing** — Mock (recorded replay), Contract (schema-lite),
  AutoTest (generated skeletons). ([mock](docs/mock.md))
- **Studio** — local visual workspace. ([docs](docs/studio.md))
- **GitHub Action / VS Code MVP** — Barx in PRs, CI, and the editor.
  ([action](docs/github_action.md), [vscode](docs/vscode.md))

Full module index: [docs/README.md](docs/README.md).

## What Barx is not

- **Not a sandbox.** Guard patches documented seams; it is not isolation.
- **Not formal verification.** Verify runs real cases and flags risks with
  evidence; it does not prove the absence of bugs.
- **Not a cloud observability platform.** Barx is local-first; nothing is
  uploaded.
- **Not a Postman replacement.** API testing brings runtime evidence to
  Python; it is not a full API client.
- **Not an LLM provider or client.** LLMTrace wraps your callables; Barx
  makes no provider calls and ships no provider SDKs.
- **Not a guarantee of safety, correct billing, or coverage.** PromptGuard
  is heuristic, Cost is an estimate, AutoTest generates starting points,
  and a GREEN gate means "no configured blockers in the available
  evidence" — not proof.

## Privacy & security

- **Local-first.** No telemetry, no hidden network calls, local storage only.
- **Prompts/responses hashed by default** (SHA-256); raw capture is an
  explicit opt-in that still redacts secrets.
- **Secrets redacted** across events, reports, and fixtures (auth headers,
  tokens, cookies, api keys, passwords).
- **Studio binds `127.0.0.1`** by default with no telemetry and no external
  assets.

## What works today

Every row below is implemented and tested. This table is the claims
registry — Barx advertises nothing before it works. The detailed,
categorized version with allowed/forbidden wording lives in
[docs/CLAIMS.md](docs/CLAIMS.md).

| Area | Status |
|---|---|
| Structured event system (stable schema, JSONL store, corrupt-line recovery) | ✅ tested |
| Runtime manager (fail-soft by default, strict mode opt-in, `BARX_DISABLED`) | ✅ tested |
| `barx.Collection` — adaptive backends: `list`, `set`, `deque`, `heap`, `sorted` (value mode) and `dict` (key-value mode) | ✅ tested |
| Collection strategy engine — thresholds + hysteresis + cooldown, confidence with stated formula, alternatives with rejection reasons, conversion-cost estimates | ✅ tested |
| Collection safety — `lock_backend`, data-preserving `rollback` (refused with a recorded warning if it would lose data), duplicate/unhashable/uncomparable fallbacks | ✅ tested |
| `pop_min()` / `iter_sorted()` on every value backend (cost varies by backend) | ✅ tested |
| Explain engine — evidence-backed answers to what/why/evidence/alternatives/confidence/rollback/override, per collection instance | ✅ tested |
| `barx.Trace` — function spans (no args/values captured), nesting, exception capture at the trace boundary, include/exclude filters, sampling, max_depth/max_events, fail-soft | ✅ tested |
| Trace ↔ Collection linkage — adaptive decisions during a trace are counted, listed, and linked via `related_event_ids` | ✅ tested |
| JSON reports (with a trace section: summaries, slowest spans, exceptions) | ✅ tested |
| HTML report — one self-contained file (inline CSS, no JS frameworks, no CDN, offline); explain-style decision cards, trace summary, CSS span timeline, event feed, raw evidence anchors, honest empty states and caps; escaped + secret-redacted; JSON fallback on failure | ✅ tested |
| `barx.verify` — behavioral verification: real cases, expected/contract checks, exception capture, latency + stability checks, type-hint warnings, redacted evidence | ✅ tested |
| `barx.verify_file` / `verify_project` — AST risk scan (20 rules, critical→low), file:line evidence, no code execution | ✅ tested |
| Verification events + explain support + report sections (JSON and HTML) + stated confidence heuristic | ✅ tested |
| `barx.API` / `barx.APISuite` — API testing with runtime evidence (optional `barx[api]` extra): status/latency/header/JSON-path/schema-lite assertions, token capture + chaining, fail-fast or continue, declarative JSON specs | ✅ tested |
| API privacy — auth headers, cookies, and token-like values redacted in all stored evidence; no raw-secret flag exists | ✅ tested |
| `barx.Policy` / `barx.Guard` — runtime guardrails (not a sandbox): 10 active rules, observe/warn/strict modes, reversible patch seams always restored (incl. before strict violations propagate), allow_network/allow_file_delete approval contexts, latency budget | ✅ tested |
| Policy events + explain + report sections; evidence redacted; stdlib-internal eval/exec exempted (documented); barx.API runner never falsely flagged | ✅ tested |
| `barx.Graph` — project graph (best-effort AST structure: imports, classes, inheritance, local calls), runtime graph (evidence-backed from events; no invented links), failure graph (event-supported chains); JSON + Mermaid-text exports, caps with disclosure | ✅ tested |
| `barx.Drift` — compare two runs across 7 categories with stated thresholds, evidence event ids, improvement findings, and zero causal language (test-enforced) | ✅ tested |
| `barx.Replay` — dry-run by default, GET-only by default, status-parity assertions, disclosed skips, shell/eval/file/pickle/policy actions never replayed; evidence-based path reconstruction | ✅ tested |
| `barx.Score` — evidence-backed trust score (formula v1.0 stated in every result: weights, penalty table, evidence ids, limitations; no score without evidence) | ✅ tested |
| `barx.ReleaseGate` — documented GREEN/AMBER/RED rules (v1.0), release confidence with stated formula, blockers/warnings with evidence, insufficient evidence → AMBER never GREEN, PR-comment markdown | ✅ tested |
| `barx.Cache` — adaptive caching (lru/lfu/ttl/fifo/no_cache/auto) with evidence-backed strategy switches, decorator, bypass disclosure, injectable clock, RLock | ✅ tested |
| `barx.Router` — measured-evidence routing (fixed/round_robin/fastest/lowest_error/auto), fair warmup, disclosed fallback, exceptions never swallowed | ✅ tested |
| `barx.Pipeline` — environment detection via find_spec only (heavy frameworks never imported) + honest workflow recommendations with limitations | ✅ tested |
| `barx.LLMTrace` — callable wrapper (no provider SDKs, no provider calls): prompts/responses as SHA-256 hashes by default, tokens only when supplied, redacted opt-in capture | ✅ tested |
| `barx.PromptGuard` — heuristic output validation (JSON, schema-lite, unsafe commands, secret leakage, undeclared tools, injection markers); observe/warn/strict | ✅ tested |
| `barx.Cost` — estimates from user-supplied price tables only; missing prices/tokens disclosed, never assumed | ✅ tested |
| AI Runtime score dimension (only when LLM events exist) + gate rules + restrained report section with privacy note | ✅ tested |
| `barx.AgentAudit` — agent-session evidence: before/after snapshots (hash/metadata, contents never stored), dependency diffs, commands/network via Guard's seams, policy links, timeline | ✅ tested |
| `barx.Mock` — redacted replay fixtures from recorded evidence (X-Barx-Mock: recorded; misses disclosed, never invented; refuses without evidence) | ✅ tested |
| `barx.Contract` — schema-lite contracts from observed responses; drift = review finding, never a breakage claim | ✅ tested |
| `barx.AutoTest` — pytest skeletons generated from evidence (review-required banner, deterministic, nothing invented) | ✅ tested |
| Barx Studio — local-only run viewer (`barx studio`): 127.0.0.1, zero telemetry, no external assets, viewer-not-source-of-truth | ✅ tested |
| Local benchmarks (`benchmarks/`) incl. honest overhead numbers for Collection and Trace | ✅ tested |
| GitHub Action (`barx-release-check`) — composite action: verdict, report, fail-on, PR comment via token; shells out to the CLI, no duplicated gate logic | ✅ tested |
| CI workflow (`ci.yml`) — Python 3.10/3.11/3.12 matrix, ruff + format gates, coverage ≥90% gate | ✅ tested |
| VS Code MVP (`vscode/barx`) — status bar + commands, shells out to CLI only, no telemetry/cloud/chat | ✅ tested |
| CLI: `version`, `runs`, `latest`, `explain`, `report`, `trace`, `verify`, `api test`, `policy`, `guard`, `graph`, `drift`, `replay`, `score`, `release-check`, `pipeline`, `llm`, `cost`, `prompt-guard`, `agent-audit`, `mock`, `contract`, `autotest`, `studio`, `ci comment` — `--json` where applicable | ✅ tested |

## Limitations

Barx shows what the evidence holds; absent evidence renders as an empty
state, never a guess. Guard is not isolation. Drift is comparative, not
causal. PromptGuard, Score, and the gate are documented heuristics, not
proofs. Cost is an estimate from your price table. AgentAudit cannot see
inside child processes. AutoTest output requires human review. Supported
on Python 3.10–3.12 (3.13/3.14 are unverified). The full list lives in
[docs/AUDIT.md](docs/AUDIT.md) and each module's doc.

## Principles

- **No event, no product.** Explanations and reports are rendered from
  recorded events, never invented.
- **Fail soft.** Instrumentation failures never break your program unless
  you opt into strict mode.
- **No magic.** Every adaptive switch is loggable, explainable, rollback-
  able, and overridable.
- **Private by default.** No telemetry, no hidden network calls, local
  storage only.

## Development

```bash
python -m venv .venv && .venv/bin/pip install -e ".[dev]"
.venv/bin/pytest --cov=barx        # full suite, coverage ≥ 90%
.venv/bin/ruff check barx tests && .venv/bin/ruff format --check barx tests
python scripts/launch_smoke.py     # end-to-end launch smoke
python scripts/run_examples.py     # run all safe examples
```

Docs index: [docs/README.md](docs/README.md) ·
Architecture: [docs/architecture.md](docs/architecture.md) ·
Website: [docs/website.md](docs/website.md) ·
Claims: [docs/CLAIMS.md](docs/CLAIMS.md) ·
Changelog: [CHANGELOG.md](CHANGELOG.md) ·
Roadmap: [docs/ROADMAP.md](docs/ROADMAP.md)

## License

MIT © 2026 Karthik Barma. See [LICENSE](LICENSE). Built under the Aura
banner.
