Metadata-Version: 2.4
Name: certisigma-census
Version: 1.15.1
Summary: Cryptographic file inventory, SBOM attestation, and exfiltration detection — powered by CertiSigma
Project-URL: Homepage, https://certisigma.ch
Project-URL: Documentation, https://developers.certisigma.ch/census
Project-URL: Repository, https://github.com/massimocavallin/certisigma-census
Project-URL: Issues, https://github.com/massimocavallin/certisigma-census/issues
Author: Ten Sigma Sagl
License-Expression: MIT
License-File: LICENSE
Keywords: attestation,breach-detection,cryptography,cyclonedx,file-integrity,forensics,sbom,spdx,supply-chain
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Security :: Cryptography
Classifier: Topic :: System :: Filesystems
Requires-Python: >=3.10
Requires-Dist: certisigma>=1.9.0
Requires-Dist: click>=8.1
Requires-Dist: tomli>=2.0; python_version < '3.11'
Provides-Extra: dev
Requires-Dist: click-man>=0.5; extra == 'dev'
Requires-Dist: fpdf2>=2.8.0; extra == 'dev'
Requires-Dist: hypothesis>=6.0; extra == 'dev'
Requires-Dist: mypy>=1.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: pyyaml>=6.0; extra == 'dev'
Requires-Dist: watchdog>=4.0.0; extra == 'dev'
Provides-Extra: report
Requires-Dist: fpdf2>=2.8.0; extra == 'report'
Provides-Extra: watch
Requires-Dist: watchdog>=4.0.0; extra == 'watch'
Description-Content-Type: text/markdown

# CertiSigma Census

[![Test](https://github.com/massimocavallin/certisigma-census/actions/workflows/test.yml/badge.svg)](https://github.com/massimocavallin/certisigma-census/actions/workflows/test.yml)
[![PyPI](https://img.shields.io/pypi/v/certisigma-census)](https://pypi.org/project/certisigma-census/)
[![Python](https://img.shields.io/pypi/pyversions/certisigma-census)](https://pypi.org/project/certisigma-census/)
[![Coverage](https://img.shields.io/badge/coverage-83%25-green)](https://github.com/massimocavallin/certisigma-census)

Cryptographic file inventory, exfiltration detection, and supply-chain attestation — powered by [CertiSigma](https://certisigma.ch).

Census scans directories, computes SHA-256 hashes, attests them via the CertiSigma API (three-layer cryptographic proof: ECDSA T0, qualified TSA T1, Bitcoin T2), and maintains a local manifest. When suspect files surface, Census compares their hashes against the registry to prove — with cryptographic certainty — whether they match inventoried assets.

**Supply-chain integrity:** Census is the only SBOM attestation tool that anchors every component hash with a three-layer cryptographic proof chain (ECDSA + TSA + Bitcoin). Parse your SPDX or CycloneDX SBOMs and get timestamped, independently verifiable proof that each dependency existed at a specific point in time — exceeding EU CRA, NIS2, and US EO 14028 requirements.

## Installation

```bash
pip install certisigma-census

# With watch mode (filesystem monitoring)
pip install certisigma-census[watch]

# With PDF report generation
pip install certisigma-census[report]

# Everything
pip install "certisigma-census[watch,report]"
```

Requires Python 3.10+. TOML config support on Python 3.10 uses `tomli` (auto-installed).

## Quick Start

### 1. Inventory scan

```bash
export CERTISIGMA_API_KEY=cs_...

# Scan a directory and attest all file hashes
census scan /path/to/sensitive-files --source inventory-hr

# Dry run — hash only, no attestation
census scan /path/to/files --dry-run

# Scan only PDFs and Word docs, skip files over 100 MB
census scan /data --include "*.pdf" --include "*.docx" --max-size 100M

# Resume an interrupted scan
census scan /data --source quarterly --manifest inventory.db --resume

# Parallel hashing for large directories (4 CPU cores)
census scan /data --workers 4

# Attest the manifest itself (proves manifest existed at scan time)
census scan /data --attest-manifest
```

This produces a `.census-manifest.db` (SQLite) mapping each hash to its file path, size, and attestation metadata.

### 2. Breach comparison

```bash
# Compare suspect files against the CertiSigma registry
census compare /path/to/suspect-files --manifest /path/to/.census-manifest.db

# Save report as JSON or CSV
census compare /suspect --output report.json
census compare /suspect --output report.csv
```

Exit code: `0` if no matches, `1` if matches found.

### 3. Manifest status and export

```bash
# Show summary
census status /path/to/.census-manifest.db

# Export manifest as CSV for compliance reporting
census export manifest.db --format csv --output inventory.csv

# Export as JSON
census export manifest.db --format json --output inventory.json

# Export as sha256sum (GNU coreutils compatible — works with sha256sum -c)
census export manifest.db --format sha256sum --output checksums.sha256
```

### 4. Evidence verification

```bash
# Verify a hash against the CertiSigma registry
census verify a1b2c3d4e5f67890...

# Verify a file (hash it first, then check)
census verify /path/to/document.pdf --file

# Full-chain manifest verification (all hashes against the registry)
census verify-manifest inventory.db --strict
census verify-manifest inventory.db --detailed --json

# Hash from stdin (for pipes and CI/CD)
echo "data" | census hash --stdin

# Save OpenTimestamps proof
census verify a1b2c3... --save-ots proof.ots
```

No API key required — all verification endpoints are public.

### 5. Integrity check

```bash
# Check files against manifest baseline
census integrity manifest.db

# Strict mode: exit 1 on any discrepancy
census integrity manifest.db --strict

# Differential mode: only report NEW findings since last run
census integrity manifest.db --since auto --write-state auto
```

100% local operation — no API calls, no network needed.

### 5b. Update baseline (AIDE-style)

```bash
# Accept verified changes into manifest (interactive confirmation)
census update manifest.db

# Non-interactive (CI/cron)
census update manifest.db --yes

# Then attest new hashes
census scan /data --resume --manifest manifest.db
```

Completes the FIM workflow: detect → review → accept. New entries are unattested until the next scan.

### 6. Forensic reports

```bash
# HTML report (always available, zero dependencies)
census report manifest.db -o report.html

# PDF report (requires: pip install certisigma-census[report])
census report manifest.db -o report.pdf --evidence --integrity

# Evidence bundle: ZIP with report + OTS proofs + checksums
census report manifest.db -o bundle.zip --bundle --evidence

# Attest the report itself (three-layer cryptographic proof)
census report manifest.db -o report.pdf --attest --api-key cs_...
# → writes report.pdf + report.pdf.attestation.json

# Verify a previously attested report
census verify-report report.pdf
```

### 7. Manifest diff

```bash
# Compare two manifests
census diff baseline.db current.db

# HTML diff report
census diff baseline.db current.db -o diff.html

# Machine-readable (exit codes: 0=none, 1=added, 2=removed, 4=modified)
census diff baseline.db current.db --json
```

### 8. Standalone hashing

```bash
# Hash a file
census hash document.pdf

# Hash a directory
census hash /path/to/files

# Verify against known hash
census hash document.pdf --verify a1b2c3d4e5...
```

### 9. Attestation tracking

```bash
# Check attestation status
census track att_12345

# Wait for Bitcoin anchoring (default)
census track att_12345 --poll --timeout 7200

# Wait for TSA certification only (faster than T2)
census track att_12345 --poll --level T1
```

### 10. Webhooks (T1/T2 lifecycle push notifications)

```bash
# Register a webhook for T1 (TSA) and T2 (Bitcoin) events
census webhook register --url https://hooks.example.com/certisigma \
    --events t1_complete,t2_complete --label prod-monitor \
    --save-secret .census-webhook-secret

# List registered webhooks
census webhook list --json

# View delivery history
census webhook deliveries wh_abc123

# Start a webhook receiver with T1/T2 hooks
census webhook serve --secret-file .census-webhook-secret \
    --on-t1 'echo "T1 certified" | tee -a /var/log/census.log' \
    --on-t2 'curl -X POST https://slack.webhook/...'

# Verify a saved webhook payload (forensic evidence chain)
census webhook verify-payload delivery.json \
    --signature "sha256=abc..." --secret-file .census-webhook-secret

# Delete a webhook
census webhook delete wh_abc123
```

### 11. Watch mode with full T1/T2 lifecycle

```bash
# Watch with T0 + T1/T2 end-to-end hooks
census watch /data \
    --on-change "jq . >> /var/log/census-changes.jsonl" \
    --on-attest "echo 'T0 attested'" \
    --on-t1 "echo 'T1 TSA certified'" \
    --on-t2 "curl -X POST https://slack.webhook/..." \
    --webhook-secret-file .census-webhook-secret \
    --webhook-port 9514
```

### 12. Self-diagnostic

```bash
# Run all health checks
census doctor

# Check including a specific manifest
census doctor --manifest inventory.db

# Machine-readable output for CI
census doctor --json
```

### 11. Manifest merging

```bash
# Merge manifests from different servers
census merge server1.db server2.db -o combined.db

# Merge with glob
census merge scans/*.db -o full-inventory.db --json
```

### 12. Audit log

```bash
# View all operations
census audit-log show

# Verify hash chain integrity
census audit-log verify

# Machine-readable
census audit-log show --last 10 --json
```

### 13. Named snapshots

```bash
# Create a compliance baseline
census snapshot create q1-baseline --manifest inventory.db

# List snapshots
census snapshot list

# Compare two snapshots
census snapshot diff q1-baseline q2-baseline
```

### 14. Forensic annotation

```bash
# Annotate an attestation with case metadata
census annotate att_123 --note "Evidence for case FR-2026-42" --tag "case-2026-001"

# Zero-knowledge mode: encrypt before sending
census annotate att_123 --note "Confidential" --encrypt --encryption-key <key>

# GDPR right-to-erasure
census annotate att_123 --delete
```

### 15. Configuration

```bash
# Create config template
census config init --project

# View effective config
census config show

# Enable shell completions
eval "$(census completion bash)"
```

### 16. Forensic share tokens

```bash
# Create a share token (chain of custody)
census share create <att_id> --expires 24h --recipient "Legal Dept" --max-uses 5

# List / inspect / revoke
census share list --json
census share info <token_id>
census share revoke <token_id>
```

### 17. Structured tagging

```bash
# Tag attestations for classification
census tag set <att_id> -t department=legal -t case=2026-001

# Encrypted tags (zero-knowledge)
census tag set <att_id> -t classification=confidential --encrypt

# Query by tags (AND logic, cursor pagination)
census tag query -f department=legal --limit 50 --json
```

### 18. Key rotation

```bash
# Rotate encryption key (NIST SP 800-57)
census key-rotate <att_id> --old-key <hex64> --new-key <hex64>
```

### 19. Derived lists (third-party breach detection)

```bash
# Create an opaque HMAC-SHA256 derived list from your manifest
census derived-list create --manifest ./manifest.db --label "Q1 2026"

# Third party matches their suspects (server never sees plaintext)
census derived-list match <list_id> --list-key <hex64> --hashes-file suspects.txt

# Audit trail
census derived-list access-log <list_id>
```

### 20. Metadata read

```bash
census metadata get <att_id> --json
census metadata get <att_id> --decrypt --encryption-key <hex64>
```

### 21. Watch mode (continuous monitoring)

```bash
# Watch a directory for changes and attest new/modified files
census watch /path/to/files --source "production"

# Dry run — hash only, no attestation
census watch /data --dry-run

# Network mount — use polling
census watch /mnt/share --polling --poll-interval 10

# Event hooks — run commands on change/attestation (JSON on stdin)
census watch /data --on-change "jq . >> /var/log/census-changes.jsonl" \
                   --on-attest "curl -X POST https://slack.webhook/..."
```

Requires: `pip install certisigma-census[watch]`

Production deployment via systemd: see `contrib/census-watch@.service`.

### 22. Manifest seal (tamper evidence)

```bash
# Create an HMAC-SHA256 seal for a manifest
census seal ./manifest.db --key $(census key-gen)

# Verify the seal before trusting a manifest
census verify-seal ./manifest.db --key <hex64>

# JSON output
census verify-seal ./manifest.db --key <hex64> --json
```

The seal proves the manifest has not been modified since it was sealed. Follows the Tripwire/AIDE signed-database pattern.

### 23. Quiet mode (scripting)

```bash
# Suppress informational output — only errors and exit codes
census -q scan /data --dry-run
census -q compare /suspects

# Quiet + JSON — clean machine-readable output
census -q scan /data --json --attest-manifest
```

### 24. Bulk leak detection

```bash
# Scan a suspect drive against your org inventory (up to 50K hashes/call)
census bulk-scan /mnt/suspect-drive --json

# Cross-reference with a local manifest
census bulk-scan ./data --manifest inventory.db --workers 4

# Dry run — hash and count, no API call (save rate limit)
census bulk-scan /data --dry-run

# Label the scan for incident tracking
census bulk-scan /exports --source incident-2026-003 --json

# Save results to file
census bulk-scan /exports --output results.json

# Report-only mode — always exit 0 (for CI pipelines)
census bulk-scan /data --exit-zero --json > results.json

# Summary mode — counts only, no match details
census bulk-scan /data --summary --exit-zero
```

Exit code: `0` if no matches (or `--exit-zero`), `1` if matches found (potential exfiltration).

### 25. Organization statistics

```bash
# View org-level inventory stats
census stats

# Machine-readable
census stats --json
```

### 26. SARIF output (CI/CD integration)

```bash
# Compare with SARIF output for GitHub Security tab
census compare /suspects --format sarif > results.sarif

# Write SARIF directly to file (recommended for CI/CD)
census compare /suspects --format sarif --output results.sarif

# Report-only mode — always exit 0 (upload SARIF without pipeline failure)
census compare /suspects --format sarif --output results.sarif --exit-zero

# Summary mode — counts only, concise CI logs
census compare /suspects --summary --exit-zero

# SARIF + JSON are also available
census compare /suspects --format json
```

SARIF v2.1.0 output can be uploaded to GitHub Security tab, VS Code SARIF Viewer, and other compatible tools.

### 27. JSONL streaming output

```bash
# Stream results to a log file (one JSON object per line)
census compare /suspects --format jsonl >> /var/log/census/matches.jsonl

# Pipe to jq for real-time filtering
census compare /suspects --format jsonl | jq 'select(.level=="T2")'

# JSONL is available on compare, bulk-scan, integrity, verify-manifest, and diff
census integrity manifest.db --format jsonl
census diff base.db current.db --format jsonl
```

### 28. On-match notification hooks

```bash
# Execute a command when matches are found (JSON on stdin)
census compare /suspects --on-match './scripts/alert.sh'

# POST to a webhook
census compare /suspects --on-match 'curl -s -X POST -d @- https://hooks.slack.com/...'

# Also available on bulk-scan
census bulk-scan /data --on-match 'python3 scripts/notify.py'
```

The `--on-match` command is only executed when matches > 0. Match data (JSON) is piped to stdin.

### 29. GitHub Actions

```yaml
# Breach detection with SARIF upload (3 lines)
- uses: certisigma/census-action@v1
  with:
    command: compare
    target: ./artifacts
    manifest: ./inventory.db
  env:
    CERTISIGMA_API_KEY: ${{ secrets.CERTISIGMA_API_KEY }}

# Integrity check (no API key needed)
- uses: certisigma/census-action@v1
  with:
    command: integrity
    manifest: ./inventory.db

# Inventory scan on release
- uses: certisigma/census-action@v1
  with:
    command: scan
    target: ./src
    source: release-${{ github.ref_name }}
  env:
    CERTISIGMA_API_KEY: ${{ secrets.CERTISIGMA_API_KEY }}
```

Composite action — zero Docker overhead, SARIF auto-upload to GitHub Security tab, step summary, masked secrets. Full docs: [`docs/features/github-action.md`](docs/features/github-action.md)

### 30. Compliance reports

```bash
# NIS2 compliance report (default)
census compliance-report manifest.db -o report.html

# DORA compliance report
census compliance-report manifest.db --template dora -o report.html

# ISO 27001
census compliance-report manifest.db --template iso27001 -o report.html

# With integrity check included
census compliance-report manifest.db --integrity -o report.html

# Machine-readable JSON
census compliance-report manifest.db --json
```

Maps Census data to regulatory requirements (NIS2, DORA, ISO 27001). 100% local — no API calls. Uses manifest data and optional integrity check.

### 31. Forensic archive

```bash
# Create a forensic evidence package from a manifest
census archive manifest.db -o evidence-2026-03-18.zip

# With chain of custody metadata
census archive manifest.db -o case-42.zip \
  --examiner "J. Doe" --case-id CASE-42 --organization "Acme Corp"

# Verify archive integrity
census verify-archive evidence-2026-03-18.zip
```

Creates a self-contained ZIP with: manifest database, full inventory (JSON), system metadata, chain of custody, SHA256SUMS for offline verification. Follows EnCase/FTK conventions for evidence packaging.

### AI Governance

```bash
# Generate a policy template
census ai-policy init

# Edit .census-ai-policy.toml to define allow/exclude rules

# Classify assets (dry run — no API calls)
census ai-policy apply manifest.db --dry-run

# Apply classifications and tag attestations
census ai-policy apply manifest.db --api-key cs_...

# Generate HTML compliance report
census ai-policy report manifest.db -o ai-report.html

# JSON output
census ai-policy report manifest.db --json
```

Classify inventoried assets for ML/AI training compliance using TOML-based policies. Rules match files by glob patterns and size filters. Safety-first: unmatched files default to `exclude`. Supports EU AI Act, ISO/IEC 42001, and C2PA frameworks. Classification is 100% local; only `apply` (without `--dry-run`) makes API calls to tag attestations.

### SBOM Attestation

```bash
# Attest an SPDX SBOM
census sbom attest sbom.spdx.json --source "ci-pipeline"

# Attest a CycloneDX SBOM (dry run)
census sbom attest bom.cdx.json --dry-run --json

# Verify SBOM components against the registry
census sbom verify sbom.spdx.json --json

# CI-friendly verify (never fails on missing)
census sbom verify bom.cdx.json --exit-zero

# Inspect SBOM structure and hash coverage
census sbom summary sbom.spdx.json --json
```

Extracts SHA-256 hashes from SPDX 2.x and CycloneDX JSON SBOMs and batch-attests them via the CertiSigma API. Each component hash receives the same three-layer cryptographic proof (ECDSA T0, TSA T1, Bitcoin T2) as file attestations, providing independently verifiable, timestamped evidence of your supply chain composition. Supports EU CRA, NIS2, and US EO 14028 compliance.

## How It Works

1. **Scan** — Census walks the directory, computes SHA-256 for each file (streamed, constant memory), and builds a local manifest.
2. **Attest** — Hashes are sent in batches (up to 100 per call) to the CertiSigma API. Each hash receives a three-layer cryptographic proof (T0 ECDSA signature, T1 qualified TSA timestamp, T2 Bitcoin anchor).
3. **Compare** — Suspect files are hashed and verified against the registry via `POST /verify/batch`. Matches prove the file was previously inventoried, regardless of filename or directory structure changes.

The original file content **never leaves** the client. Only SHA-256 hashes are transmitted.

## Features

| Feature | Description | Docs |
|---------|-------------|------|
| **File filters** | `--include`, `--exclude` globs; `--min-size`, `--max-size` | [scanning.md](docs/features/scanning.md) |
| **Resume scans** | `--resume` skips unchanged files, preserves attestation state | [scanning.md](docs/features/scanning.md) |
| **CSV/JSON export** | Compare reports and manifest export in both formats | [comparison.md](docs/features/comparison.md) |
| **Retry with backoff** | Automatic retry on 429/5xx with exponential backoff | [retry-and-resilience.md](docs/features/retry-and-resilience.md) |
| **Structured logging** | `--log-format json` for SIEM/ELK integration | [logging.md](docs/features/logging.md) |
| **Progress bars** | Visual feedback for scan, attest, and compare operations | [scanning.md](docs/features/scanning.md) |
| **SQLite manifest** | WAL mode, indexed lookups, auto-migration from JSON | [manifest.md](docs/features/manifest.md) |
| **Watch mode** | Continuous filesystem monitoring with batch attestation | [watching.md](docs/features/watching.md) |
| **Evidence verification** | Full T0/T1/T2 chain, OTS proof export | [evidence.md](docs/features/evidence.md) |
| **Integrity check** | Tamper detection against manifest baseline, differential mode | [integrity.md](docs/features/integrity.md) |
| **Forensic reports** | HTML, PDF, evidence bundles (ZIP) | [reporting.md](docs/features/reporting.md) |
| **Manifest diff** | Compare snapshots, AIDE-style exit codes, HTML reports | [diff.md](docs/features/diff.md) |
| **Standalone hashing** | SHA-256 without manifests or API calls | [hash.md](docs/features/hash.md) |
| **Attestation tracking** | Monitor T0/T1/T2 progression with `--poll` or `--level T1\|T2` | [tracking.md](docs/features/tracking.md) |
| **Webhooks** | Push-based T1/T2 lifecycle notifications with HMAC verification | — |
| **Config files** | TOML config with user/project precedence | [config.md](docs/features/config.md) |
| **Shell completions** | bash, zsh, fish via `census completion` | — |
| **Self-diagnostic** | API health, config, inotify, manifest integrity | [doctor.md](docs/features/doctor.md) |
| **Manifest merging** | Combine manifests from distributed scans | [merge.md](docs/features/merge.md) |
| **JSON output** | `--json` on scan, compare, status, doctor, merge | — |
| **Audit log** | Tamper-evident JSONL with SHA-256 hash chain | [audit-log.md](docs/features/audit-log.md) |
| **Named snapshots** | Compliance baselines with diff comparison | [snapshots.md](docs/features/snapshots.md) |
| **Forensic annotation** | Metadata, tags, case IDs on attestations | [annotate.md](docs/features/annotate.md) |
| **Zero-knowledge encryption** | AES-256-GCM client-side metadata encryption | [annotate.md](docs/features/annotate.md) |
| **Forensic sharing** | Time-limited, use-limited share tokens (chain of custody) | [sharing.md](docs/features/sharing.md) |
| **Structured tagging** | Key-value classification with encrypted tags and query | [tagging.md](docs/features/tagging.md) |
| **Key rotation** | NIST SP 800-57 AES-256 key rotation for metadata + tags | [key-rotation.md](docs/features/key-rotation.md) |
| **Derived lists** | HMAC-SHA256 opaque third-party breach detection | [derived-lists.md](docs/features/derived-lists.md) |
| **Metadata read** | Read attestation metadata with optional decryption | — |
| **Manifest seal** | HMAC-SHA256 tamper-evidence seal (Tripwire/AIDE pattern) | [seal.md](docs/features/seal.md) |
| **Quiet mode** | `--quiet` / `-q` suppresses info output for scripting | — |
| **Manifest self-attestation** | `--attest-manifest` anchors manifest hash at scan time | — |
| **Bulk leak detection** | `bulk-scan` — 50K hashes/call, `--dry-run`, `--source`, `--output` | — |
| **Organization stats** | `stats` — total claims, unique hashes, monthly breakdown | — |
| **SARIF output** | `compare --format sarif` — v2.1.0 with help, tags, invocations, file write | — |
| **Baseline update** | `update` — AIDE-style accept verified changes into manifest (detect → review → accept) | — |
| **JSONL streaming** | `--format jsonl` on compare, bulk-scan, integrity, verify-manifest, diff | — |
| **On-match hooks** | `--on-match CMD` — execute command with results on stdin (compare, bulk-scan) | — |
| **CI/CD integration** | `--exit-zero` (report-only mode), `--summary` (counts only) on compare and bulk-scan | — |
| **`--no-color`** | Disable colored output; also respects `NO_COLOR` env var (no-color.org) | — |
| **Forensic JSON metadata** | `census_version` and `elapsed_seconds` in all JSON output | — |
| **GitHub Action** | `certisigma/census-action@v1` — composite action for CI/CD with SARIF upload | [github-action.md](docs/features/github-action.md) |
| **Compliance reports** | `compliance-report` — NIS2, DORA, ISO 27001 mapping from manifest data (100% local) | — |
| **Developers page** | Standalone HTML documentation at [developers.certisigma.ch/census](https://developers.certisigma.ch/census) | [census.html](docs/census.html) |
| **AI governance** | `ai-policy init/apply/report` — TOML policy engine for ML/AI training asset classification (EU AI Act, ISO 42001) | — |
| **Manifest encryption** | AES-256-GCM encryption at rest for manifest files (`--encryption-key` / `CENSUS_ENCRYPTION_KEY`) | — |
| **Man pages** | Pre-generated man pages for all commands via `click-man` in `docs/man/` | — |
| **PEP 561** | `py.typed` marker for mypy/pyright inline type annotation support | — |
| **File attribution** | Captures file owner, group, POSIX permissions during scan (manifest schema v3) | — |
| **Attested reports** | `report --attest` + `verify-report` — three-layer proof on the report itself | — |
| **SBOM attestation** | `sbom attest/verify/summary` — SPDX 2.x + CycloneDX JSON supply-chain attestation (EU CRA, NIS2, EO 14028) | [sbom.md](docs/features/sbom.md) |
| **Docker image** | `ghcr.io/certisigma/census` for CI/CD scanning | — |

Full documentation: [`docs/features/`](docs/features/)

## CLI Reference

### Global options

| Option | Description |
|--------|-------------|
| `-v` / `--verbose` | Enable debug logging |
| `-q` / `--quiet` | Suppress informational output (errors and `--json` always shown) |
| `--log-format text\|json` | Log output format (default: text). Also: `CENSUS_LOG_FORMAT` env var |
| `--encryption-key HEX` | AES-256 key (64 hex) for manifest encryption at rest. Also: `CENSUS_ENCRYPTION_KEY` env var |
| `--no-color` | Disable colored output (also respects `NO_COLOR` env, see [no-color.org](https://no-color.org)) |
| `--version` | Show version |

### `census scan`

| Option | Description |
|--------|-------------|
| `--source LABEL` | Source label for attestations |
| `--manifest PATH` | Manifest output path (default: `<dir>/.census-manifest.db`) |
| `--api-key KEY` | API key (or set `CERTISIGMA_API_KEY`) |
| `--base-url URL` | Override API base URL |
| `--dry-run` | Hash only, no attestation |
| `--resume` | Resume interrupted scan |
| `--include GLOB` | Include files matching pattern (repeatable) |
| `--exclude GLOB` | Exclude files matching pattern (repeatable) |
| `--min-size SIZE` | Skip files smaller than SIZE (e.g. `1K`, `10M`) |
| `--max-size SIZE` | Skip files larger than SIZE (default: `5G`) |
| `--workers N` | Parallel hashing workers (default: 1, max: 8) |
| `--attest-manifest` | Attest the manifest's own SHA-256 after scan |
| `--json` | Machine-readable JSON summary |

### `census compare`

| Option | Description |
|--------|-------------|
| `--manifest PATH` | Local manifest for cross-referencing |
| `--output PATH` | Save report (`.json` or `.csv` by extension) |
| `--format text\|json\|sarif\|jsonl` | Output format (default: text). `sarif` emits SARIF v2.1.0; `jsonl` streams one JSON object per match |
| `--include/--exclude/--min-size/--max-size` | Same filters as scan |
| `--detailed` | Enriched results: source label, T0/T1/T2 level (requires API key) |
| `--workers N` | Parallel hashing workers (default: 1, max: 8) |
| `--json` | Machine-readable JSON output (equivalent to `--format json`) |
| `--exit-zero` | Always exit 0 (report-only mode for CI pipelines) |
| `--summary` | Show only counts, no match details |
| `--on-match CMD` | Execute CMD with match results as JSON on stdin (only if matches > 0) |

### `census export`

| Option | Description |
|--------|-------------|
| `--format csv\|json\|sha256sum` | Output format (default: csv) |
| `--output PATH` | Output file (default: stdout) |

### `census verify`

| Option | Description |
|--------|-------------|
| `--file` | Treat argument as a file path (hash it first) |
| `--save-ots PATH` | Save OTS proof to this path |
| `--json` | Machine-readable JSON output |
| `--api-key KEY` | API key (optional for verify) |
| `--base-url URL` | Override API base URL |

### `census verify-manifest`

| Option | Description |
|--------|-------------|
| `--detailed` | Fetch enriched data (source, level) per hash |
| `--strict` | Exit with code 1 if any hash is not attested |
| `--json` | Machine-readable JSON output |
| `-o`/`--output PATH` | Save report (`.csv` or `.json`) |
| `--api-key KEY` | API key (optional, needed for `--detailed`) |
| `--base-url URL` | Override API base URL |

### `census integrity`

| Option | Description |
|--------|-------------|
| `--json` | Machine-readable JSON output |
| `--format text\|json\|jsonl` | Output format (default: text) |
| `--output PATH` | Save results (`.csv` or `.json` by extension) |
| `--strict` | Exit with code 1 on any discrepancy |
| `--since PATH` | Differential: load previous state, suppress known findings (`auto` = sidecar) |
| `--write-state PATH` | Save current state for next differential run (`auto` = sidecar) |

### `census update`

| Option | Description |
|--------|-------------|
| `--yes` / `-y` | Skip confirmation prompt (non-interactive) |
| `--json` | Machine-readable JSON output |

Runs integrity check, then applies changes (remove missing, re-hash modified, add new). New entries are `attested=False`.

### `census report`

| Option | Description |
|--------|-------------|
| `-o`/`--output PATH` | Output file (`.html`, `.pdf`, or `.zip`) **required** |
| `--evidence` | Fetch T0/T1/T2 evidence chain for attested files |
| `--integrity` | Run integrity check and include results |
| `--bundle` | Generate evidence bundle (ZIP) |
| `--attest` | Attest the report's own hash via CertiSigma (three-layer proof) |
| `--api-key KEY` | API key (needed with `--evidence` or `--attest`) |

### `census verify-report`

| Option | Description |
|--------|-------------|
| `--sidecar PATH` | Custom sidecar path (default: `<report>.attestation.json`) |
| `--json` | Machine-readable JSON output |

### `census status`

| Option | Description |
|--------|-------------|
| `--json` | Machine-readable JSON output |

### `census doctor`

| Option | Description |
|--------|-------------|
| `--manifest PATH` | Check health of a specific manifest file |
| `--json` | Machine-readable JSON output |
| `--api-key KEY` | API key |
| `--base-url URL` | Override API base URL |

### `census merge`

| Option | Description |
|--------|-------------|
| `-o`/`--output PATH` | Output manifest path **required** |
| `--json` | Machine-readable JSON summary |

### `census diff`

| Option | Description |
|--------|-------------|
| `--json` | Machine-readable JSON output |
| `-o`/`--output PATH` | Save report (`.html`, `.csv`, or `.json` by extension) |
| `--summary` | Show only counts, no individual file details |

Exit codes: 0=none, 1=added, 2=removed, 4=modified (bitmask, OR'd together).

### `census hash`

| Option | Description |
|--------|-------------|
| `--stdin` | Read data from stdin instead of a file |
| `--verify HASH` | Compare computed hash against expected SHA-256 |
| `--json` | Output as JSON array |

### `census track`

| Option | Description |
|--------|-------------|
| `--poll` | Continuously check until target level reached |
| `--level T1\|T2` | Target proof level (default: T2). Use T1 for TSA-only |
| `--poll-interval SECS` | Seconds between checks (default: 60) |
| `--timeout SECS` | Max time to poll (default: 3600) |
| `--json` | Machine-readable JSON output |
| `--api-key KEY` | API key |
| `--base-url URL` | Override API base URL |

### `census webhook`

| Subcommand | Description |
|------------|-------------|
| `register` | Register a webhook for T1/T2 events |
| `list` | List registered webhooks |
| `delete WEBHOOK_ID` | Delete a webhook and its delivery history |
| `deliveries WEBHOOK_ID` | Show delivery history |
| `verify-payload FILE` | Verify HMAC signature of a saved payload |
| `serve` | Start webhook receiver HTTP server |

**`census webhook register` options:**

| Option | Description |
|--------|-------------|
| `--url URL` | HTTPS callback URL (required) |
| `--events LIST` | Comma-separated: `t1_complete,t2_complete` (required) |
| `--label LABEL` | Human-readable label (max 200 chars) |
| `--save-secret FILE` | Save signing secret to file (0o600 permissions) |
| `--json` | Machine-readable JSON output |

**`census webhook serve` options:**

| Option | Description |
|--------|-------------|
| `--secret-file FILE` | Signing secret file (required) |
| `--port PORT` | Listen port (default: 9514) |
| `--bind ADDR` | Bind address (default: 127.0.0.1) |
| `--on-t1 CMD` | Shell command on T1 event (JSON on stdin) |
| `--on-t2 CMD` | Shell command on T2 event (JSON on stdin) |
| `--tls-cert FILE` | PEM certificate for built-in TLS |
| `--tls-key FILE` | PEM private key for built-in TLS |
| `--replay-window SECS` | Anti-replay window (default: 300) |

### `census config`

| Action | Description |
|--------|-------------|
| `show` | Display effective merged config |
| `init` | Create a template config file |
| `paths` | Show config file locations |
| `--project` | Act on project `.census.toml` |

### `census audit-log`

| Action | Description |
|--------|-------------|
| `show` | Display audit log entries |
| `verify` | Check hash chain integrity |
| `clear` | Delete the audit log file |
| `--log-path PATH` | Override audit log file path |
| `--last N` | Show only last N entries (with `show`) |
| `--json` | Machine-readable JSON output |

### `census snapshot`

| Action | Description |
|--------|-------------|
| `create <name>` | Save a named snapshot of a manifest |
| `list` | List all snapshots |
| `diff <name1> <name2>` | Compare two snapshots |
| `delete <name>` | Remove a snapshot |
| `--manifest PATH` | Manifest to snapshot (required for `create`) |
| `--snapshot-dir PATH` | Override snapshot directory |
| `--json` | Machine-readable JSON output |

### `census annotate`

| Option | Description |
|--------|-------------|
| `--note TEXT` | Free-text note |
| `--tag TEXT` | Tag label (e.g. case number) |
| `--case-id TEXT` | Forensic case identifier |
| `--source TEXT` | Update source label |
| `--delete` | Soft-delete metadata (GDPR) |
| `--encrypt` | Encrypt client-side (AES-256-GCM) |
| `--encryption-key HEX` | 64-char hex AES-256 key |
| `--decrypt` | Decrypt and display stored metadata |
| `--json` | Machine-readable JSON output |
| `--api-key KEY` | API key |

### `census share`

| Action / Option | Description |
|--------|-------------|
| `create <att_id>...` | Create share token for attestation(s) |
| `list` | List all share tokens |
| `info <token_id>` | Inspect a specific token |
| `revoke <token_id>` | Revoke a token |
| `--expires DURATION` | Token lifetime: `30m`, `24h`, `7d` (default: `24h`) |
| `--recipient TEXT` | Recipient label |
| `--max-uses N` | Max usage count |
| `--json` | Machine-readable JSON output |

### `census tag`

| Action / Option | Description |
|--------|-------------|
| `set <att_id>` | Set tags (requires `-t key=value`) |
| `get <att_id>` | List tags on an attestation |
| `delete <att_id> <key>` | Delete a specific tag |
| `query` | Query attestations by tag filter |
| `-t`, `--tag key=value` | Tag pair (repeatable) |
| `-f`, `--filter key=value` | Query filter (repeatable, AND logic) |
| `--encrypt` | Encrypt tag values (AES-256-GCM) |
| `--decrypt` | Decrypt on get |
| `--limit N` | Max query results (default: 100) |
| `--cursor TOKEN` | Pagination cursor |
| `--json` | Machine-readable JSON output |

### `census key-rotate`

| Option | Description |
|--------|-------------|
| `<attestation_id>` | Target attestation |
| `--old-key HEX` | Current 64-char hex AES-256 key |
| `--new-key HEX` | New 64-char hex AES-256 key |
| `--json` | Machine-readable JSON output |

### `census derived-list`

| Action / Option | Description |
|--------|-------------|
| `create` | Create HMAC-SHA256 derived list |
| `list` | List all derived lists |
| `info <list_id>` | Get list details |
| `match <list_id>` | Match suspect hashes against list |
| `access-log <list_id>` | View access audit trail |
| `signature <list_id>` | ECDSA signature verification (no auth required) |
| `revoke <list_id>` | Revoke a list |
| `--manifest PATH` | Manifest to read hashes from |
| `--tag-filter JSON` | JSON tag filter for server-side selection |
| `--label TEXT` | Human-readable label |
| `--expires HOURS` | Expiry in hours (max 2160) |
| `--list-key HEX` | HMAC key (64 hex chars) for match |
| `--hashes-file PATH` | File with one hash per line for match |
| `--json` | Machine-readable JSON output |

### `census metadata`

| Action / Option | Description |
|--------|-------------|
| `get <att_id>` | Read attestation metadata |
| `--decrypt` | Decrypt encrypted extra_data |
| `--encryption-key HEX` | 64-char hex AES-256 key |
| `--json` | Machine-readable JSON output |

### `census key-gen`

Generate a random AES-256 encryption key (64 hex characters, 256 bits). The key is shown only once — store it securely.

```bash
census key-gen              # outputs the key to stdout
census key-gen --json       # JSON output: {"key": "...", "algorithm": "AES-256-GCM", "bits": 256}
```

### `census completion`

Takes a shell name: `bash`, `zsh`, or `fish`.

```bash
eval "$(census completion bash)"   # bash
eval "$(census completion zsh)"    # zsh
census completion fish | source    # fish
```

### `census watch`

| Option | Description |
|--------|-------------|
| `--debounce SECS` | Quiet period before processing (default: 2.0s) |
| `--batch-interval SECS` | Max time between attestation batches (default: 30s) |
| `--scan-on-start / --no-scan-on-start` | Baseline scan before watching (default: on) |
| `--on-delete ignore\|mark\|remove` | Action on file deletion (default: ignore) |
| `--polling` | Use PollingObserver for NFS/CIFS mounts |
| `--poll-interval SECS` | Polling interval (default: 5s) |
| `--source/--manifest/--api-key/--dry-run` | Same as `census scan` |
| `--include/--exclude/--min-size/--max-size` | Same filters as scan |
| `--on-change CMD` | Shell command on file change (JSON on stdin) |
| `--on-attest CMD` | Shell command after attestation (JSON on stdin) |
| `--on-t1 CMD` | Shell command on T1 (TSA) webhook event (JSON on stdin) |
| `--on-t2 CMD` | Shell command on T2 (Bitcoin) webhook event (JSON on stdin) |
| `--webhook-secret-file FILE` | Signing secret for webhook receiver |
| `--webhook-port PORT` | Webhook receiver port (default: 9514) |
| `--webhook-bind ADDR` | Webhook receiver bind address (default: 127.0.0.1) |

Requires: `pip install certisigma-census[watch]`

### `census archive`

| Option | Description |
|--------|-------------|
| `MANIFEST` | Path to the manifest database |
| `-o/--output PATH` | Output ZIP path (default: evidence-YYYY-MM-DD.census.zip) |
| `--examiner NAME` | Examiner name (chain of custody) |
| `--case-id ID` | Case identifier (chain of custody) |
| `--notes TEXT` | Free-text notes (chain of custody) |
| `--organization NAME` | Organization name (chain of custody) |
| `--no-compress` | Store files without compression |
| `--no-seal` | Exclude manifest seal even if present |
| `--json` | Machine-readable JSON output with forensic metadata |

### `census verify-archive`

| Option | Description |
|--------|-------------|
| `ARCHIVE_PATH` | Path to the Census evidence archive |
| `--json` | Machine-readable JSON output with forensic metadata |

Verifies SHA256SUMS against actual archive contents. Exit code 0 = valid, 1 = tampered. Archives larger than 500 MB are rejected (decompression bomb guard).

### `census seal`

| Option | Description |
|--------|-------------|
| `MANIFEST_PATH` | Path to the manifest file |
| `--key KEY` | HMAC key (64 hex chars = 256 bits) |
| `--json` | Machine-readable JSON output |

### `census verify-seal`

| Option | Description |
|--------|-------------|
| `MANIFEST_PATH` | Path to the manifest file |
| `--key KEY` | HMAC key used to create the seal |
| `--json` | Machine-readable JSON output |

Exit code 0 = valid, 1 = invalid or error.

### `census bulk-scan`

| Option | Description |
|--------|-------------|
| `SUSPECT_DIR` | Directory to scan |
| `--manifest PATH` | Local manifest for cross-referencing original paths |
| `--include/--exclude/--min-size/--max-size` | Same filters as scan |
| `--workers N` | Parallel hashing workers (default: 1, max: 8) |
| `--source LABEL` | Source label for audit logging (e.g. incident ID) |
| `--dry-run` | Hash only, no API call — preview file/hash/chunk counts |
| `--output PATH` | Save results to JSON file |
| `--json` | Machine-readable JSON output |
| `--exit-zero` | Always exit 0 (report-only mode for CI pipelines) |
| `--summary` | Show only counts, no match details |
| `--api-key KEY` | API key (requires `scan` scope) |
| `--base-url URL` | Override API base URL |

Uses `POST /scan` — up to 50K hashes per call with automatic chunking. Exit code: 0=no matches, 1=matches found (or always 0 with `--exit-zero`).

### `census stats`

| Option | Description |
|--------|-------------|
| `--json` | Machine-readable JSON output |
| `--api-key KEY` | API key (requires `batch` scope) |
| `--base-url URL` | Override API base URL |

### `census sbom`

| Subcommand | Description |
|------------|-------------|
| `attest SBOM_FILE` | Parse SBOM and batch-attest all SHA-256 component hashes |
| `verify SBOM_FILE` | Verify SBOM hashes against the CertiSigma registry |
| `summary SBOM_FILE` | Inspect SBOM structure, component count, hash coverage |

**`census sbom attest` options:**

| Option | Description |
|--------|-------------|
| `--format auto\|spdx\|cyclonedx` | Force format (auto-detected by default) |
| `--source LABEL` | Source label for attestations |
| `--manifest PATH` | Save attested hashes to this manifest |
| `--dry-run` | Parse only, do not call the API |
| `--json` | Machine-readable JSON output |

**`census sbom verify` options:**

| Option | Description |
|--------|-------------|
| `--format auto\|spdx\|cyclonedx` | Force format |
| `--detailed` | Include attestation level, source, timestamps |
| `--exit-zero` | Always exit 0 (report-only, for CI) |
| `--json` | Machine-readable JSON output |

**`census sbom summary` options:**

| Option | Description |
|--------|-------------|
| `--format auto\|spdx\|cyclonedx` | Force format |
| `--json` | Machine-readable JSON output |

Supports SPDX 2.2/2.3 and CycloneDX 1.4/1.5/1.6 JSON. File size limit: 100 MB.

## Exit Codes

| Code | Context | Meaning |
|------|---------|---------|
| `0` | All commands | Success (or `--exit-zero` report-only mode) |
| `1` | All commands | General error (API, I/O, config, or matches found) |
| `2` | All commands | Usage error (invalid arguments — Click handles this) |
| `1` | `integrity --strict` | Violations detected (missing, modified, or new files) |
| bitmask | `diff` | `1`=added, `2`=removed, `4`=modified (OR'd together) |
| `0` | `compare --exit-zero` | Always 0, even if matches found (for CI) |

## Manifest Encryption at Rest

Census can encrypt manifests on disk using AES-256-GCM:

```bash
# Generate a key
census key-gen

# Scan with encryption — manifest is saved as .db.enc
census --encryption-key <hex64> scan /data --dry-run

# Load an encrypted manifest
census --encryption-key <hex64> status manifest.db

# Or use the environment variable (recommended for automation)
export CENSUS_ENCRYPTION_KEY=<hex64>
census scan /data --dry-run
census status manifest.db
```

Key resolution precedence: `--encryption-key` > `CENSUS_ENCRYPTION_KEY` env > config file `encryption_key`.

Encrypted manifests are auto-detected by their `CENSUS_ENC\x01` header. `census doctor` reports encryption status.

## Man Pages

Man pages are included in the source repository under `docs/man/` but are **not** installed automatically by `pip`. To use them:

```bash
# Option 1: read directly from the source tree
man docs/man/census.1

# Option 2: install system-wide (requires root)
sudo install -m 644 docs/man/*.1 /usr/local/share/man/man1/

# Regenerate after adding new commands
./scripts/generate-man-pages.sh
```

For quick CLI help without man pages, use `census --help` or `census <command> --help`.

## Dependencies

- [`certisigma`](https://pypi.org/project/certisigma/) — Official CertiSigma Python SDK
- [`click`](https://click.palletsprojects.com/) — CLI framework

Optional:
- [`watchdog`](https://pypi.org/project/watchdog/) — Filesystem monitoring (only for `census watch`)
- [`fpdf2`](https://pypi.org/project/fpdf2/) — PDF report generation (only for `census report` with `.pdf` output)

## Testing

```bash
pip install -e ".[dev]"

# Unit tests (950+ tests, ~20s)
pytest --tb=short -q

# With coverage report
pytest --cov --cov-report=html

# Integration tests (requires API key)
CERTISIGMA_API_KEY=cs_demo_xxx pytest -m integration -v

# Performance benchmarks
python scripts/benchmark.py --files 1000 --output results.json
```

## License

MIT — Ten Sigma Sagl
