Metadata-Version: 2.4
Name: veridata-recon
Version: 0.1.0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Rust
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Security :: Cryptography
Classifier: Topic :: Software Development :: Libraries
Summary: Verifiable Reconciliation Proofs — cryptographic data pipeline integrity verification
Keywords: data-reconciliation,cryptographic-proofs,data-integrity,merkle-tree,pipeline-verification,data-quality,vrp
Author: Vaquar Khan
License: Apache-2.0
Requires-Python: >=3.8
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Documentation, https://github.com/vaquarkhan/veridata/tree/main/python
Project-URL: Homepage, https://github.com/vaquarkhan/veridata
Project-URL: Issues, https://github.com/vaquarkhan/veridata/issues
Project-URL: Repository, https://github.com/vaquarkhan/veridata

# veridata-recon

**Verifiable Reconciliation Proofs for Python** — powered by Rust.

> ⚠️ This is *not* the `veridata` pandas-cleaning package. This library provides
> **cryptographic data pipeline reconciliation** using Merkle trees, Ed25519
> signatures, and the VRP (Verifiable Reconciliation Proof) format.

## What It Does

`veridata-recon` lets you mathematically prove that data made it from a source
system to a sink system without:

- **Drops** — records lost in transit
- **Mutations** — records altered during transfer
- **Duplicates** — records replicated unexpectedly

It generates cryptographic proofs (VRP documents) that can be verified offline by
any party with the public key — no access to the original data required.

## Installation

```bash
pip install veridata-recon
```

## Quick Start

```python
import veridata_recon as vr

# Generate a random salt for this reconciliation run
salt = vr.generate_salt()

# Your source and sink records (e.g., from Kafka topic and Iceberg table)
source = [
    {"order_id": "1001", "item": "widget", "qty": "5", "status": "shipped"},
    {"order_id": "1002", "item": "gadget", "qty": "3", "status": "pending"},
    {"order_id": "1003", "item": "gizmo",  "qty": "1", "status": "shipped"},
]

sink = [
    {"order_id": "1001", "item": "widget", "qty": "5", "status": "shipped"},
    {"order_id": "1002", "item": "gadget", "qty": "3", "status": "pending"},
    {"order_id": "1003", "item": "gizmo",  "qty": "1", "status": "shipped"},
]

# Reconcile with cryptographic proof
result = vr.reconcile(
    source=source,
    sink=sink,
    identity_rule="composite:[order_id]",
    content_fields=["order_id", "item", "qty", "status"],
    salt=salt,
)

print(result["verdict"])        # "PASS"
print(result["matched_count"])  # 3
```

## Detecting Issues

```python
# Source has 3 records, sink is missing one
sink_missing = source[:2]

result = vr.reconcile(
    source=source,
    sink=sink_missing,
    identity_rule="composite:[order_id]",
    content_fields=["order_id", "item", "qty", "status"],
    salt=salt,
)

print(result["verdict"])      # "FAIL"
print(len(result["missing"])) # 1 — order_id 1003 dropped
```

## Key Features

### Hashing

```python
# SHA-256 (default) or BLAKE3
digest = vr.hash_bytes(b"hello world")
digest_b3 = vr.hash_bytes(b"hello world", algorithm="blake3")
```

### Fingerprinting

```python
fp = vr.fingerprint(
    record={"order_id": "1001", "amount": "99.99"},
    identity_rule="composite:[order_id]",
    content_fields=["order_id", "amount"],
    salt=salt,
)
# Returns: {"id_hash": "ab12...", "content_hash": "cd34...", "fingerprint": "ef56..."}
```

### Key Management

```python
# Generate a new Ed25519 key pair
keys = vr.generate_keypair()
print(keys["public_key"])   # base64-encoded
print(keys["private_key"])  # base64-encoded — keep secret!

# Reload from private key
keys2 = vr.keypair_from_private(keys["private_key"])
assert keys2["public_key"] == keys["public_key"]
```

### Proof Verification

```python
# Verify a .vrp.json proof file offline
outcome = vr.verify_proof("path/to/proof.vrp.json", public_key_b64)
# Returns: "PASS", "FAIL", or "UNVERIFIED"
```

## Use Cases

- **Data Pipeline Integrity**: Prove Kafka→Iceberg pipelines don't lose data
- **Regulatory Compliance**: Cryptographic evidence of data completeness
- **CI/CD Gates**: Fail builds if reconciliation doesn't pass
- **Cross-Team Trust**: Share proofs without sharing raw data
- **Audit Trails**: Chain proofs for continuous monitoring

## How It Works

1. **Fingerprint** each record using salted, domain-separated hashing
2. **Reconcile** source vs sink fingerprint sets
3. **Produce a verdict**: PASS, FAIL, or UNVERIFIED
4. **Generate Merkle proofs** for missing records (offline verifiable)
5. **Sign** the entire proof with Ed25519

## Performance

Built on Rust with zero-copy where possible. Handles millions of records
efficiently thanks to the underlying `veridata-core` engine.

## License

Apache-2.0

