Metadata-Version: 2.4
Name: certisigma-census
Version: 0.3.0
Summary: Cryptographic file inventory and exfiltration detection — powered by CertiSigma
Project-URL: Homepage, https://certisigma.ch
Project-URL: Documentation, https://developers.certisigma.ch/sdk
Project-URL: Repository, https://github.com/massimocavallin/certisigma-census
Project-URL: Issues, https://github.com/massimocavallin/certisigma-census/issues
Author: Ten Sigma Sagl
License-Expression: MIT
Keywords: attestation,breach-detection,cryptography,file-integrity,forensics
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Security :: Cryptography
Classifier: Topic :: System :: Filesystems
Requires-Python: >=3.10
Requires-Dist: certisigma>=1.5.0
Requires-Dist: click>=8.1
Provides-Extra: dev
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: watchdog>=4.0.0; extra == 'dev'
Provides-Extra: watch
Requires-Dist: watchdog>=4.0.0; extra == 'watch'
Description-Content-Type: text/markdown

# CertiSigma Census

Cryptographic file inventory and exfiltration detection — powered by [CertiSigma](https://certisigma.ch).

Census scans directories, computes SHA-256 hashes, attests them via the CertiSigma API (three-layer cryptographic proof: ECDSA T0, qualified TSA T1, Bitcoin T2), and maintains a local manifest. When suspect files surface, Census compares their hashes against the registry to prove — with cryptographic certainty — whether they match inventoried assets.

## Installation

```bash
pip install certisigma-census

# With watch mode (filesystem monitoring)
pip install certisigma-census[watch]
```

Requires Python 3.10+.

## Quick Start

### 1. Inventory scan

```bash
export CERTISIGMA_API_KEY=cs_...

# Scan a directory and attest all file hashes
census scan /path/to/sensitive-files --source inventory-hr

# Dry run — hash only, no attestation
census scan /path/to/files --dry-run

# Scan only PDFs and Word docs, skip files over 100 MB
census scan /data --include "*.pdf" --include "*.docx" --max-size 100M

# Resume an interrupted scan
census scan /data --source quarterly --manifest inventory.db --resume
```

This produces a `.census-manifest.db` (SQLite) mapping each hash to its file path, size, and attestation metadata.

### 2. Breach comparison

```bash
# Compare suspect files against the CertiSigma registry
census compare /path/to/suspect-files --manifest /path/to/.census-manifest.db

# Save report as JSON or CSV
census compare /suspect --output report.json
census compare /suspect --output report.csv
```

Exit code: `0` if no matches, `1` if matches found.

### 3. Manifest status and export

```bash
# Show summary
census status /path/to/.census-manifest.db

# Export manifest as CSV for compliance reporting
census export manifest.db --format csv --output inventory.csv

# Export as JSON
census export manifest.db --format json --output inventory.json
```

### 4. Watch mode (continuous monitoring)

```bash
# Watch a directory for changes and attest new/modified files
census watch /path/to/files --source "production"

# Dry run — hash only, no attestation
census watch /data --dry-run

# Network mount — use polling
census watch /mnt/share --polling --poll-interval 10
```

Requires: `pip install certisigma-census[watch]`

## How It Works

1. **Scan** — Census walks the directory, computes SHA-256 for each file (streamed, constant memory), and builds a local manifest.
2. **Attest** — Hashes are sent in batches (up to 100 per call) to the CertiSigma API. Each hash receives a three-layer cryptographic proof (T0 ECDSA signature, T1 qualified TSA timestamp, T2 Bitcoin anchor).
3. **Compare** — Suspect files are hashed and verified against the registry via `POST /verify/batch`. Matches prove the file was previously inventoried, regardless of filename or directory structure changes.

The original file content **never leaves** the client. Only SHA-256 hashes are transmitted.

## Features

| Feature | Description | Docs |
|---------|-------------|------|
| **File filters** | `--include`, `--exclude` globs; `--min-size`, `--max-size` | [scanning.md](docs/features/scanning.md) |
| **Resume scans** | `--resume` skips unchanged files, preserves attestation state | [scanning.md](docs/features/scanning.md) |
| **CSV/JSON export** | Compare reports and manifest export in both formats | [comparison.md](docs/features/comparison.md) |
| **Retry with backoff** | Automatic retry on 429/5xx with exponential backoff | [retry-and-resilience.md](docs/features/retry-and-resilience.md) |
| **Structured logging** | `--log-format json` for SIEM/ELK integration | [logging.md](docs/features/logging.md) |
| **Progress bars** | Visual feedback for scan, attest, and compare operations | [scanning.md](docs/features/scanning.md) |
| **SQLite manifest** | WAL mode, indexed lookups, auto-migration from JSON | [manifest.md](docs/features/manifest.md) |
| **Watch mode** | Continuous filesystem monitoring with batch attestation | [watching.md](docs/features/watching.md) |

Full documentation: [`docs/features/`](docs/features/)

## CLI Reference

### Global options

| Option | Description |
|--------|-------------|
| `-v` / `--verbose` | Enable debug logging |
| `--log-format text\|json` | Log output format (default: text) |
| `--version` | Show version |

### `census scan`

| Option | Description |
|--------|-------------|
| `--source LABEL` | Source label for attestations |
| `--manifest PATH` | Manifest output path (default: `<dir>/.census-manifest.db`) |
| `--api-key KEY` | API key (or set `CERTISIGMA_API_KEY`) |
| `--base-url URL` | Override API base URL |
| `--dry-run` | Hash only, no attestation |
| `--resume` | Resume interrupted scan |
| `--include GLOB` | Include files matching pattern (repeatable) |
| `--exclude GLOB` | Exclude files matching pattern (repeatable) |
| `--min-size SIZE` | Skip files smaller than SIZE (e.g. `1K`, `10M`) |
| `--max-size SIZE` | Skip files larger than SIZE (default: `5G`) |

### `census compare`

| Option | Description |
|--------|-------------|
| `--manifest PATH` | Local manifest for cross-referencing |
| `--output PATH` | Save report (`.json` or `.csv` by extension) |
| `--include/--exclude/--min-size/--max-size` | Same filters as scan |

### `census export`

| Option | Description |
|--------|-------------|
| `--format csv\|json` | Output format (default: csv) |
| `--output PATH` | Output file (default: stdout) |

### `census status`

Takes a manifest path as argument. No additional options.

### `census watch`

| Option | Description |
|--------|-------------|
| `--debounce SECS` | Quiet period before processing (default: 2.0s) |
| `--batch-interval SECS` | Max time between attestation batches (default: 30s) |
| `--scan-on-start / --no-scan-on-start` | Baseline scan before watching (default: on) |
| `--on-delete ignore\|mark\|remove` | Action on file deletion (default: ignore) |
| `--polling` | Use PollingObserver for NFS/CIFS mounts |
| `--poll-interval SECS` | Polling interval (default: 5s) |
| `--source/--manifest/--api-key/--dry-run` | Same as `census scan` |
| `--include/--exclude/--min-size/--max-size` | Same filters as scan |

Requires: `pip install certisigma-census[watch]`

## Dependencies

- [`certisigma`](https://pypi.org/project/certisigma/) — Official CertiSigma Python SDK
- [`click`](https://click.palletsprojects.com/) — CLI framework

Optional: [`watchdog`](https://pypi.org/project/watchdog/) — Filesystem monitoring (only for `census watch`)

## License

MIT — Ten Sigma Sagl
