Metadata-Version: 2.4
Name: auditweave
Version: 0.1.0
Summary: Tamper-evident, auditor-navigable evidence trails for AI-assisted and data-transformation workflows.
Author: Vimal Nakrani
License: Apache-2.0
Project-URL: Homepage, https://github.com/vimalnakrani08/auditweave
Project-URL: Issues, https://github.com/vimalnakrani08/auditweave/issues
Keywords: audit,rag,lineage,compliance,provenance,ai-governance,lakehouse,tamper-evident
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Office/Business :: Financial :: Accounting
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: dev
Requires-Dist: pytest>=7; extra == "dev"
Dynamic: license-file

# AuditWeave

**Tamper-evident, auditor-navigable evidence trails for AI-assisted and data-transformation workflows.**

When an AI system or a data pipeline produces a conclusion that someone later has to *defend* — to an auditor, a regulator, a court, or an internal reviewer — the hard question is always the same: **where did this come from, and can you prove it wasn't changed after the fact?**

AuditWeave answers that. It records every step of a workflow — source documents, retrievals, data transformations, model inferences, decisions, and human sign-offs — into a single, append-only, cryptographically chained ledger. Any conclusion can be traced back to the exact evidence beneath it, and any tampering with the record is detectable.

```python
from auditweave import Recorder
from auditweave.adapters import RagTrace
from auditweave.navigator import evidence_report

rec = Recorder()
rag = RagTrace(rec, model="gpt-4o", model_version="2025-01")

ids = rag.record_turn(
    query="Why did Q3 revenue exceed forecast?",
    retrieved=[{"name": "contract #4471", "text": "Perpetual license, $0.7M"}],
    prompt="Explain using only the documents.",
    answer="A one-time $0.7M license sale drove the Q3 spike.",
)

print(evidence_report(rec.trail, ids["decision_id"]))
```

## Why this exists

The fastest-growing use of AI in regulated industries is *assistive*: a model reads documents and proposes a conclusion a human then relies on. Auditors use it to navigate enormous datasets; banks use it for credit and fraud workflows; healthcare uses it for prior-authorization. In every one of these, regulators are converging on the same requirement — **source attribution and traceability**: you must be able to show what evidence informed an AI-influenced decision, and prove the record is intact.

Existing tools mostly solve a *different* problem. ML observability platforms (drift, latency, cost) and model-governance suites are built for the **ML engineer**. AuditWeave is built for the **reviewer** — the person who has to reconstruct one conclusion from the bottom up and trust that the trail hasn't been edited.

It is deliberately small, dependency-free, and framework-agnostic, because the only audit layer that gets adopted is the one that's trivial to add to a pipeline you already have.

## What it does

- **One vocabulary for two worlds.** A RAG pipeline and a Spark/lakehouse pipeline write into the *same* trail, so a conclusion that depends on both AI inference *and* upstream data transformations is traceable end-to-end.
- **Tamper-evident by construction.** Every event is SHA-256 hash-chained to the previous one. Altering, inserting, deleting, or reordering any event breaks the chain, and `trail.verify()` reports exactly where.
- **Confidential-data friendly.** Bind the trail to the *hash* of a source document instead of its contents, so you can prove which bytes informed a decision without copying regulated data into the ledger.
- **Navigable.** `evidence_report(trail, decision_id)` renders the full evidence behind any single conclusion, in reading order, with an integrity stamp.
- **Append-only persistence.** The default JSON Lines store grows only by addition — diff it over time and you see additions, never rewrites.
- **Zero runtime dependencies.** Standard library only. PySpark is optional and never imported unless you use it.

## Install

```bash
pip install auditweave          # once published
```

From source (editable install, with the dev/test extras):

```bash
git clone https://github.com/vimalnakrani08/auditweave
cd auditweave
pip install -e ".[dev]"
```

The package is pure Python with **zero runtime dependencies** and builds with
the standard `setuptools` backend, so it installs cleanly anywhere Python 3.9+
runs — no special toolchain required.

## Running the tests

The full suite covers chain integrity, tamper detection (modification and
reordering), provenance tracing, append-only persistence round-trips, both
adapters, and the cross-pipeline case where one conclusion depends on both a
data pipeline and an AI inference:

```bash
pip install -e ".[dev]"
pytest -q
```

All tests should pass. Integrity and provenance are the whole point of this
project, so every behavior that matters is covered by a test — including
explicit tests that a tampered record (in memory *and* after being saved to
disk and reloaded) is always detected.

Try it end-to-end without writing any code:

```bash
python examples/audit_demo.py
```

This runs a full audit scenario — a lakehouse aggregation plus a RAG
explanation, signed off by a human — and prints the navigable evidence report
with an integrity stamp.

## Core concepts

| Concept | What it is |
|---|---|
| `Event` | One immutable record: a source, retrieval, transformation, inference, decision, or attestation. |
| `Trail` | The append-only, hash-chained sequence of events. Supports `verify()` and `trace()`. |
| `Recorder` | The one-call-per-step API your pipeline (or an adapter) uses. |
| Adapters | `RagTrace` for retrieval+LLM turns; `TableTrace` for tabular/Spark lineage. |
| `evidence_report` | Renders the evidence behind a conclusion for a human reviewer. |

## Verifying integrity

```python
result = rec.trail.verify()
if not result:
    for issue in result.issues:
        print(f"[seq {issue.sequence}] {issue.problem}")
```

If anyone edits a persisted trail — even a single digit in a recorded figure — reloading and verifying it fails, pinpointing the altered event.

## Roadmap

- Adapters for LangChain and LlamaIndex callbacks (auto-instrumentation)
- Optional signed attestations (per-reviewer keypairs)
- HTML / PDF evidence-report export
- Merkle-root anchoring for external timestamping

Contributions welcome — see [CONTRIBUTING.md](CONTRIBUTING.md).

## License

Apache-2.0. See [LICENSE](LICENSE).

---

*AuditWeave is infrastructure for accountability, not a compliance guarantee. Whether a given trail satisfies a specific regulation is a question for your auditors and counsel.*
