Metadata-Version: 2.4
Name: drc-scanner
Version: 0.2.0
Summary: Data Revenue Connecter — read-only database profiling agent that builds a signed Data Passport
Author: Data Revenue Group
License: Proprietary
Project-URL: Homepage, https://app.datarevenue.io
Keywords: data valuation,database profiling,data passport,read-only
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Operating System :: OS Independent
Classifier: Topic :: Database
Classifier: Intended Audience :: Information Technology
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: typer>=0.12.0
Requires-Dist: rich>=13.7.0
Requires-Dist: pydantic>=2.7.0
Requires-Dist: psycopg[binary]>=3.1.0
Requires-Dist: cryptography>=42.0.0
Provides-Extra: dev
Requires-Dist: pytest>=8.2.0; extra == "dev"
Requires-Dist: ruff>=0.4.0; extra == "dev"
Provides-Extra: mysql
Requires-Dist: mysql-connector-python>=8.4.0; extra == "mysql"
Provides-Extra: sqlserver
Requires-Dist: pyodbc>=5.1.0; extra == "sqlserver"
Provides-Extra: snowflake
Requires-Dist: snowflake-connector-python>=3.10.0; extra == "snowflake"
Provides-Extra: bigquery
Requires-Dist: google-cloud-bigquery>=3.20.0; extra == "bigquery"
Provides-Extra: all-connectors
Requires-Dist: mysql-connector-python>=8.4.0; extra == "all-connectors"
Requires-Dist: pyodbc>=5.1.0; extra == "all-connectors"
Requires-Dist: snowflake-connector-python>=3.10.0; extra == "all-connectors"
Requires-Dist: google-cloud-bigquery>=3.20.0; extra == "all-connectors"

# DRC Scanner — Data Revenue Connecter

A **read-only** database profiling agent. It runs inside your own network, profiles your
database, and produces a small, human-readable, signed **Data Passport** — a JSON file of
aggregate statistics only. **Your raw data never leaves your machine.** You then upload the
Passport to app.datarevenue.io to auto-fill your data valuation questionnaire.

## What leaves your network

Nothing during the scan. The scanner makes **zero network calls** while profiling. The only
artifact produced is the Data Passport (`drc_passport_<date>.json`) — column names, row
counts, and aggregate metrics. Never row values, never sample data. You can inspect the
entire file before uploading it.

## Install & run

```bash
# Option 1 — pip
pip install drc-scanner
drc-scan run --connect postgresql://readonly@host:5432/proddb

# Option 2 — Docker (no Python required)
docker run -it datarevenue/scanner \
  --connect postgresql://readonly@host.docker.internal:5432/proddb
```

Output:

```
drc_passport_2026-06-11.json   (the Data Passport)
+ on-screen inventory summary and upload instructions
```

## Create a read-only database user first

```bash
drc-scan setup --db postgres
```

This prints the exact SQL to create a least-privilege, read-only role so the scanner can
never modify your data — copy-paste it for your DBA.

## Safety guarantees

- **Read-only enforced in code.** Every query passes through a verb allow-list; `INSERT`,
  `UPDATE`, `DELETE`, and all DDL raise a hard error *before* reaching the database.
- **No raw export.** The Passport builder rejects any field containing row-level values.
- **Offline scan.** No network egress during profiling — verifiable with a firewall rule.
- **Signed & hashed.** Each Passport carries a SHA-256 content hash (Ed25519 signing added
  in the next build) so the platform can detect tampering.

## Status

v0.1 skeleton: Postgres connector, table/column inventory, record counts, Passport with
content hash. Full metric engine, PII detection, additional connectors, and Ed25519 signing
follow in subsequent builds.
