Metadata-Version: 2.4
Name: cloakllm-verifier
Version: 0.12.0
Summary: Standalone, dependency-light verifier for CloakLLM audit artifacts -- verify hash chains, Ed25519 signatures, key provenance, RFC 3161 timestamps, and compliance reports WITHOUT the full SDK or trusting CloakLLM's code.
Author-email: The CloakLLM Authors <cloakllm@gmail.com>
License: MIT
Project-URL: Homepage, https://cloakllm.dev
Project-URL: Repository, https://github.com/cloakllm/cloakllm-verifier
Keywords: audit,verification,eu-ai-act,compliance,attestation,rfc3161,hash-chain
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Security :: Cryptography
Classifier: Intended Audience :: Legal Industry
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: cloakllm[attestation,timestamping]<0.13.0,>=0.12.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Dynamic: license-file

# cloakllm-verifier

**Independently verify CloakLLM audit artifacts — without the PII-detection stack, and without trusting CloakLLM's code.**

CloakLLM's whole pitch is *compliance you can verify, not compliance you're asked to believe.* This is the tool that makes that literal: an auditor, regulator, or CI pipeline installs `cloakllm-verifier` and checks the artifacts themselves.

It **reuses CloakLLM's own verification code** (single source of truth — no reimplementation, no drift) but pulls **only the cryptography extras** — no spaCy, no NLP models. A lean install for people who need to *check*, not *produce*.

```bash
pip install cloakllm-verifier      # Python; crypto deps only, no spaCy
npm install cloakllm-verifier      # JavaScript; zero deps beyond cloakllm (see js/README.md)
```

This repo ships **two packages** from one source of truth: the Python package at the root and the JavaScript package under [`js/`](js/). Both expose the same CLI (`cloakllm-verify`) and the same checks, with byte-comparable `--json` output.

## CLI

```bash
cloakllm-verify audit      ./cloakllm_audit                 # hash-chain integrity
cloakllm-verify timestamp  ./cloakllm_audit                 # offline RFC 3161 checkpoint tokens
cloakllm-verify keys       cert.json --manifest m.json      # KeyManifest provenance + revocation
cloakllm-verify report     report.json ./cloakllm_audit     # re-validate a compliance report
cloakllm-verify all        ./cloakllm_audit                 # everything, one exit code
cloakllm-verify audit ./cloakllm_audit --json               # machine-readable (CI)
```

Exit code `0` = verified, `1` = failed/invalid. Output is ASCII-only.

## Python API

```python
from cloakllm_verifier import verify_audit, verify_timestamps, verify_all

r = verify_all("./cloakllm_audit")
assert r["ok"], r          # {ok, audit: {...}, timestamps: {...}}
```

## What it checks
- **Hash-chain integrity** — recomputes every SHA-256 link from the canonical JSON; any tampered, reordered, deleted, or relinked entry fails.
- **RFC 3161 trusted timestamps** — offline-verifies every `chain_checkpoint` token (proving the chain existed no later than the TSA's time); reports the earliest provable time.
- **KeyManifest provenance + revocation** — verifies a signed certificate against its published KeyManifest (signature, key-id binding, validity window, manifest-hash integrity, offline-root signature when claimed) and checks it against a root-signed RevocationList.
- **Compliance-report re-validation** — independently re-verifies the audit chain a report describes and rejects any report that claims a *verified* chain or a *COMPLIANT* verdict over a log that does not actually verify. It does not trust the report's own claims.

## What a passing result does — and does NOT — prove

Be precise about the guarantees (an auditor needs to know the edges):

- **`audit` proves integrity + internal consistency of the entries present** — every SHA-256 link recomputes and chains cleanly. It does **not** prove:
  - **Completeness.** A hash chain is anchored at its genesis, not its head, so removing entries from the **end** (tail truncation) leaves a still-valid prefix. Detecting truncation needs an external head anchor — an **RFC 3161 checkpoint over the final `entry_hash`** (`timestamp`), which binds "the chain was at least this long at time T".
  - **Authenticity.** The chain is a keyless SHA-256 construction: anyone who can write the log can recompute a self-consistent one. Authenticity comes from the **Ed25519 attestation** (`keys`) — a signed certificate whose key provenance you verify against a published KeyManifest.
- **`keys` without `--manifest`** only checks the certificate's signature against the key embedded in the certificate — it does **not** establish who owns that key. Pass `--manifest` for real provenance; the CLI marks signature-only results `UNVERIFIED`, not "verified".
- **`report`** re-verifies the chain and checks the report's claims for internal consistency against it; it is not a cryptographic binding of that exact report to that exact log beyond an entry-count sanity check.

### Known limitation: cross-SDK whole-number floats

A hashed floating-point field that happens to be a whole number (e.g. a timing metric of exactly `0.0` or `5.0`) serializes as `5.0` in Python but `5` in JavaScript, so a chain written by one SDK can be reported as *tampered* by the other SDK's verifier. This is a **false-negative that fails safe** (a genuine chain is flagged for investigation; a forged chain never passes), it is intermittent, and it predates this package. A proper fix is an RFC 8785-style number-canonicalization migration (a hash-semantics change, tracked for a future release). **Workaround today: verify a chain with the same-language verifier that produced it.**

## Why a separate package
So an auditor's install is minimal and its purpose is unambiguous — it's a *verifier*, not the PII middleware. The verification logic lives in `cloakllm` (reused here), so the two can never drift.

MIT · part of [CloakLLM](https://cloakllm.dev)
