Metadata-Version: 2.4
Name: canary-scan
Version: 0.1.4
Summary: Scan a document dump for canaries, trackers, web beacons, and per-recipient fingerprints before interacting with the data.
Project-URL: Homepage, https://github.com/psaintelligence/canary-scan
Project-URL: Repository, https://github.com/psaintelligence/canary-scan
Project-URL: Documentation, https://psaintelligence.github.io/canary-scan/
Author: canary-scan contributors
License: Apache-2.0
License-File: LICENSE
Keywords: canary,canarytoken,document,forensics,opsec,security,steganography,tracker,watermark
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Information Technology
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Security
Classifier: Topic :: Utilities
Requires-Python: >=3.10
Requires-Dist: defusedxml>=0.7
Requires-Dist: extract-msg>=0.48
Requires-Dist: liblnk-python
Requires-Dist: oletools>=0.60
Requires-Dist: peepdf-3
Requires-Dist: pyonenote>=0.0.2
Requires-Dist: pyyaml>=6.0
Requires-Dist: rich>=13.0
Requires-Dist: typer[all]>=0.12
Provides-Extra: imaging
Requires-Dist: pillow>=10.0; extra == 'imaging'
Requires-Dist: pyzbar>=0.1.9; extra == 'imaging'
Provides-Extra: specialized
Requires-Dist: fonttools>=4.50; extra == 'specialized'
Requires-Dist: mutagen>=1.47; extra == 'specialized'
Requires-Dist: pydicom>=2.4; extra == 'specialized'
Provides-Extra: test
Requires-Dist: openpyxl>=3.1; extra == 'test'
Requires-Dist: pytest>=8.0; extra == 'test'
Requires-Dist: python-docx>=1.1; extra == 'test'
Requires-Dist: python-pptx>=0.6; extra == 'test'
Requires-Dist: reportlab>=4.0; extra == 'test'
Requires-Dist: ruff>=0.3.0; extra == 'test'
Description-Content-Type: text/markdown

# canary-scan

[![pypi](https://img.shields.io/pypi/v/canary-scan.svg)](https://pypi.org/project/canary-scan/)
[![python](https://img.shields.io/pypi/pyversions/canary-scan.svg)](https://github.com/psaintelligence/canary-scan/)
[![build tests](https://github.com/psaintelligence/canary-scan/actions/workflows/project-tests.yml/badge.svg)](https://github.com/psaintelligence/canary-scan/actions/workflows/project-tests.yml)
[![license](https://img.shields.io/badge/license-Apache--2.0-green.svg)](https://github.com/psaintelligence/canary-scan)

**Scan document data-sources for canaries, trackers, web beacons, and per-recipient fingerprints before interacting with supplied datasets.**

When you receive a large document dump from an external party — a leak, legal disclosure, or investigation — those files and documents can contain deliberate or indirect canaries: tracking pixels, embedded JavaScript, remote template links, steganographic watermarks, or per-recipient metadata fingerprints that phone home the moment a file is opened.

`canary-scan` inspects files **without opening it in its native viewer**, extracting and analysing raw structure, metadata, embedded objects, and near-duplicate fingerprints to surface anything that may reveal to an external party that the data-source is being examined.

→ **Full documentation:** [psaintelligence.github.io/canary-scan](https://psaintelligence.github.io/canary-scan)

---

## Quick Start (Docker)

The recommended way to run `canary-scan` is via Docker, as the image bundles all required system utilities and dependencies:

```bash
# Run the scan using the GitHub Container Registry image
docker run --rm \
  -v /mnt/datasource:/data:ro \
  -v $(pwd)/canary-scan-out:/output \
  ghcr.io/psaintelligence/canary-scan:latest scan /data -o /output

# Review findings
jq '.[] | select(.severity=="critical")' canary-scan-out/canary-scan-report.json
```

---

## Quick Start (pipx)

If you prefer to run `canary-scan` directly on your host machine:

```bash
# 1. Install canary-scan
pipx install canary-scan

# 2. Install required system dependencies (Ubuntu 24.04 example)
sudo apt install libimage-exiftool-perl qpdf poppler-utils mupdf-tools \
    ripgrep unzip p7zip-full

# 3. Run the scan
canary-scan scan /mnt/datasource
```

---

## Detection pipeline

Seven sequential stages: **inventory → metadata → remote-refs → embedded → stego → uniqueness → report**

Each stage writes a JSONL artefact to `.canary-scan/`. Run `canary-scan --guide` for a concise cheat sheet.

---

## License

Apache-2.0. Bundled third-party scripts (`pdfid`, `pdf-parser`, `rtfdump`) are BSD 2-Clause — see `src/canary_scan/bundled/README.md`.
