Metadata-Version: 2.4
Name: Perception
Version: 0.9.0
Summary: Perception provides flexible, well-documented, and comprehensively tested tooling for perceptual hashing research, development, and production use.
Author-email: Thorn <info@wearethorn.org>
License-Expression: Apache-2.0
Requires-Python: <4.0,>=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy<3.0.0,>=1.26.4
Requires-Dist: opencv-contrib-python-headless<5.0.0,>=4.10.0
Requires-Dist: Pillow
Requires-Dist: pywavelets<2.0.0,>=1.5.0
Requires-Dist: validators<1.0.0,>=0.22.0
Requires-Dist: rich<14.0.0,>=13.7.0
Requires-Dist: scipy
Requires-Dist: tqdm<5.0.0,>=4.67.1
Requires-Dist: typing_extensions<5.0,>=4.0
Provides-Extra: approximate-deduplication
Requires-Dist: faiss-cpu<2.0.0,>=1.8.0; extra == "approximate-deduplication"
Requires-Dist: networkit<12.0.0,>=11.1; sys_platform != "darwin" and extra == "approximate-deduplication"
Requires-Dist: networkx<4.0,>=3.0; sys_platform == "darwin" and extra == "approximate-deduplication"
Requires-Dist: pandas; extra == "approximate-deduplication"
Provides-Extra: benchmarking
Requires-Dist: matplotlib; extra == "benchmarking"
Requires-Dist: albumentations<3.0.0,>=2.0.8; extra == "benchmarking"
Requires-Dist: pandas; extra == "benchmarking"
Requires-Dist: tabulate; extra == "benchmarking"
Requires-Dist: scikit-learn; extra == "benchmarking"
Requires-Dist: ffmpeg-python; extra == "benchmarking"
Provides-Extra: matching
Requires-Dist: aiohttp; extra == "matching"
Requires-Dist: python-json-logger; extra == "matching"
Provides-Extra: pdq
Requires-Dist: pdqhash<0.3.0,>=0.2.7; extra == "pdq"
Dynamic: license-file

# perception ![ci](https://github.com/thorn-oss/perception/workflows/ci/badge.svg)

`perception` provides flexible, well-documented, and comprehensively tested tooling for perceptual hashing research, development, and production use. See [the documentation](https://perception.thorn.engineering/en/latest/) for details.

## Background

`perception` was initially developed at [Thorn](https://www.thorn.org) as part of our work to eliminate child sexual abuse material from the internet. For more information on the issue, check out [our CEO's TED talk](https://www.thorn.org/blog/time-is-now-eliminate-csam/).

## Getting Started

### Installation

`pip install perception`

#### Optional extras

`perception` provides optional extras for additional functionality:

- `approximate-deduplication` – FAISS-based approximate-nearest-neighbor
  deduplication and graph community/clique detection (used by
  `perception.approximate_deduplication` and
  `perception.local_descriptor_deduplication`)
- `benchmarking` – tools for benchmarking perceptual hashes
- `matching` – async matching utilities
- `pdq` – Facebook's PDQ hash support

**Note for `benchmarking` extra users:** The `benchmarking` extra depends on
`albumentations`, which in turn requires `opencv-python-headless`. However,
`perception` already depends on `opencv-contrib-python-headless` (needed for
contrib modules such as `cv2.img_hash` and `cv2.SIFT_create`). Installing both
OpenCV distributions simultaneously causes file-level conflicts.

If you are using [uv](https://docs.astral.sh/uv/), this is handled
automatically:

```bash
uv pip install "perception[benchmarking]"
```

If you are using plain `pip`, install the extra and then force-reinstall the
contrib variant to remove the conflicting headless package:

```bash
pip install "perception[benchmarking]"
pip install --force-reinstall --no-deps opencv-contrib-python-headless
```

### Hashing

Hashing with different functions is simple with `perception`.

```python
from perception import hashers

file1, file2 = 'test1.jpg', 'test2.jpg'
hasher = hashers.PHash()
hash1, hash2 = hasher.compute(file1), hasher.compute(file2)
distance = hasher.compute_distance(hash1, hash2)
```

### Examples

See below for end-to-end examples for common use cases for perceptual hashes.

- [Detecting child sexual abuse material](https://perception.thorn.engineering/en/latest/examples/detecting_csam.html)
- [Deduplicating media](https://perception.thorn.engineering/en/latest/examples/deduplication.html)
- [Benchmarking perceptual hashes](https://perception.thorn.engineering/en/latest/examples/benchmarking.html)

## Supported Hashing Algorithms

`perception` currently ships with:

- pHash (DCT hash) (`perception.hashers.PHash`)
- Facebook's PDQ Hash (`perception.hashers.PDQ`)
- dHash (difference hash) (`perception.hashers.DHash`)
- aHash (average hash) (`perception.hashers.AverageHash`)
- Marr-Hildreth (`perception.hashers.MarrHildreth`)
- Color Moment (`perception.hashers.ColorMoment`)
- Block Mean (`perception.hashers.BlockMean`)
- wHash (wavelet hash) (`perception.hashers.WaveletHash`)

## Contributing

To work on the project, start by doing the following.

```bash
# Install local dependencies for code completion,
# testing, and linting.
make init
```

To do a (close to) comprehensive check before committing code, use `make precommit`.

To implement new features, please first file an issue proposing your change for discussion.

To report problems, please file an issue with sample code, expected results, actual results, and a complete traceback.

## Alternatives

There are other packages worth checking out to see if they meet your needs for perceptual hashing. Here are some
examples.

- [dedupe](https://github.com/dedupeio/dedupe)
- [imagededup](https://idealo.github.io/imagededup/)
- [ImageHash](https://github.com/JohannesBuchner/imagehash)
- [PhotoHash](https://github.com/bunchesofdonald/photohash)
