Metadata-Version: 2.4
Name: imageprism
Version: 0.1.0
Summary: Multi-dimensional image similarity comparison. Like a prism decomposes light, imageprism decomposes similarity into independent dimensions.
Keywords: image-similarity,perceptual-hash,clip,dinov2,image-comparison,deduplication,onnx
Author: Balaram
Author-email: Balaram <balaramneu@gmail.com>
License-Expression: MIT
License-File: LICENSE
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering :: Image Processing
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Typing :: Typed
Classifier: Operating System :: OS Independent
Requires-Dist: numpy
Requires-Dist: pillow
Requires-Dist: onnxruntime
Requires-Dist: huggingface-hub
Requires-Dist: ruff ; extra == 'dev'
Requires-Dist: pytest ; extra == 'dev'
Requires-Python: >=3.10
Project-URL: Homepage, https://github.com/nebulaanish/imageprism
Project-URL: Repository, https://github.com/nebulaanish/imageprism
Project-URL: Issues, https://github.com/nebulaanish/imageprism/issues
Provides-Extra: dev
Description-Content-Type: text/markdown

# imageprism

Compare two images across several kinds of similarity in one call. Runs on CPU, no PyTorch, no GPU, no API keys.

"Similar" is ambiguous. Two images can be the same file re-saved, the same kind of scene, the same specific object, or the same person, and those are different questions with different answers. imageprism scores each one as its own dimension and hands back the numbers together, so you choose the dimensions your problem actually needs. Everything runs on CPU through ONNX Runtime, NumPy, and Pillow.

```python
from imageprism import ImagePrism, Dimension

prism = ImagePrism(dimensions=[Dimension.HASH, Dimension.SEMANTIC])
result = prism.compare("a.jpg", "b.jpg")
result.scores  # {"hash": 0.12, "semantic": 0.82}
```

That is the whole public surface: one class, one method.

Doing this by hand usually means installing imagehash, a CLIP wrapper, and a face library, then reconciling three preprocessing pipelines and three output formats. imageprism puts them behind one API.

## Install

```bash
pip install imageprism
```

`ImagePrism()` with no arguments uses hashing only, which needs no downloads and runs immediately. Adding a model-backed dimension downloads its model once, caches it locally, and works offline after that.

## Dimensions

| Dimension | Answers | Technique | Model |
|---|---|---|---|
| `hash` | Pixel-level duplicate? | pHash + dHash + aHash | none (pure algorithm) |
| `semantic` | Same concept or category? | CLIP cosine similarity | CLIP ViT-B/32 quantized, ~89MB |
| `instance` | Same specific object? | DINOv2 cosine similarity | DINOv2-small, ~87MB |
| `style` | Similar visual style? | MobileNetV2 feature similarity | MobileNetV2, ~14MB |
| `face` | Same person? | Face detection + embedding | UltraFace ~1.2MB + ArcFace ~137MB (swappable, see below) |

Dimensions can be passed as enum members or plain strings: `dimensions=["hash", "semantic"]` works.

## Reading the scores

Each score is a float, but the scales differ per dimension - 0.5 does not mean "50% similar". Rough calibration, from the benchmarks and spot checks below:

- `hash`: fraction of matching hash bits. Above ~0.9 is a near-duplicate. Unrelated images land around 0.5, not 0.
- `semantic`: CLIP cosine similarity, which lives in a compressed range. Unrelated images score around 0.5; above ~0.75 usually means the same concept.
- `instance`: DINOv2 cosine similarity. The same object re-photographed scores high (0.7+); unrelated images fall near 0.
- `face`: ArcFace cosine similarity. On LFW the optimal same-person threshold is about 0.32. The score is `None` when no face is detected in either image, which is different from 0.0 (faces found, but different people).
- `style`: MobileNetV2 feature cosine. Treat as a rough signal; it is not benchmarked yet.

Thresholds always depend on your data, so validate on a sample before hard-coding one.

## Profiles

A profile picks a set of dimensions and blends them into one weighted score, keeping the per-dimension breakdown alongside.

```python
from imageprism import ImagePrism, Profile

prism = ImagePrism(profile=Profile.COPYRIGHT)
result = prism.compare("original.jpg", "suspect.jpg")
result.weighted_score  # 0.58
result.scores          # {"hash": 0.51, "instance": 0.34, "semantic": 0.82}
```

There are six: `ecommerce`, `copyright`, `dedup`, `visual_search`, `identity`, `forgery`. The last two use the face dimension, so read the licensing note below before relying on them.

## Custom weights and per-dimension config

```python
from imageprism import ImagePrism, Dimension, HashConfig

prism = ImagePrism(
    weights={Dimension.HASH: 0.6, Dimension.SEMANTIC: 0.4},
    config={Dimension.HASH: HashConfig(algorithms=("phash",), hash_size=16)},
)
```

Weights are normalized to sum to 1, so relative values are all that matter. A dimension that cannot score a pair (face with no face detected) contributes 0 to the weighted score.

## Embeddings and caching

You can pull embeddings out to store in your own index. Repeated comparisons reuse them: the cache is keyed on pixel content, so comparing one image against many others embeds it only once.

```python
emb = prism.embed("a.jpg")          # {"hash": np.array([...]), "semantic": np.array([...])}
prism.compare("a.jpg", "b.jpg")     # a.jpg is embedded here
prism.compare("a.jpg", "c.jpg")     # a.jpg comes from the cache
```

## Batch dedup

`dedup` embeds each image once and groups near-duplicates, keeping one representative per group. A typical use is trimming a video down to its distinct frames before running something expensive on each one.

```python
from imageprism import ImagePrism, Dimension

# frames pulled from a video, in order
frames = ["frame_0001.jpg", "frame_0002.jpg", "frame_0003.jpg"]

prism = ImagePrism(dimensions=[Dimension.HASH])
result = prism.dedup(frames, threshold=0.9)

result.unique                     # indices of the distinct frames
result.labels                     # for each frame, the representative it was grouped under
distinct = [frames[i] for i in result.unique]
```

Each image is embedded once, then compared against the representatives kept so far, so the model work stays linear in the number of images. There is no approximate index yet, so a large set of mostly-distinct images grows quadratically in the comparison step.

The right threshold depends on the dimension: around 0.9 on hashing catches re-encodes and small edits, while a lower value on semantic groups by content. Configure a profile or weights instead of a single dimension to dedup on a blended score.

## Face and model licensing

Face works out of the box, with one caveat. It detects the largest face with UltraFace (MIT) and embeds it with ArcFace by default. Those default ArcFace weights have no clear commercial license, because like most high-accuracy face models they trace back to research-only datasets. The first time you run the face dimension, imageprism prints a warning.

For commercial use, bring your own embedding model:

```python
from imageprism import ImagePrism, Dimension, FaceConfig

prism = ImagePrism(
    dimensions=[Dimension.FACE],
    config={Dimension.FACE: FaceConfig(embed_repo="your-org/your-model", embed_file="model.onnx")},
)
```

The model needs to accept a 112x112 RGB face crop. Common choices are FaceX (Apache-2.0), InsightFace `buffalo_l` (MIT code, but the weights need a commercial license), or one you train yourself. imageprism ships no face weights, so the choice of what you have rights to is yours.

## Benchmarks

The numbers below reproduce with the scripts in `benchmarks/`.

Hashing, on 200 LFW images under 15 transforms (JPEG, resize, crop, rotation, blur, noise, flip, brightness, contrast):

| Config | AUC | Accuracy |
|---|---|---|
| default (pHash + dHash + aHash, mean) | 0.919 | 0.885 |
| aHash only | 0.937 | 0.889 |
| dHash only | 0.900 | 0.870 |
| pHash only | 0.875 | 0.863 |

JPEG, resize, blur, noise, brightness, and contrast all sit near 1.0 AUC. The weak points are a 50% center crop (about 0.40) and a horizontal flip (about 0.59).

Semantic, retrieval on the CIFAR-100 test set (1000 images, 100 classes):

| Metric | Score |
|---|---|
| Recall@1 | 0.44 |
| Recall@5 | 0.70 |
| Recall@10 | 0.80 |
| Recall@20 | 0.88 |

CIFAR-100 images are 32px upscaled to 224 before they reach CLIP, so treat these as a floor rather than a ceiling.

Face, LFW verification over 6000 pairs: 0.963 AUC, 0.909 accuracy, 0.726 TAR at FAR=1%. Well-aligned ArcFace reaches roughly 0.998 accuracy; the gap comes from the plain crop-and-resize alignment described below.

Instance and style are not benchmarked yet.

## Limitations

- Dedup is greedy and brute-force. It embeds each image once, but the comparison step has no approximate index, so a large set of mostly-distinct images scales quadratically. There is no corpus-scale similarity search yet; a FAISS-backed index is the planned next step.
- Hashing handles JPEG, resize, blur, noise, and brightness almost perfectly, but a 50% center crop drops it to about 0.40 AUC and a horizontal flip to about 0.59.
- The style dimension uses MobileNetV2 features rather than gram matrices on intermediate layers, so it is a rough signal and is not benchmarked yet.
- Profile weights are sensible defaults, not values tuned on data.
- Face alignment is a plain crop and resize with no landmark step, which puts LFW accuracy near 91% against roughly 99.8% for well-aligned ArcFace. It works, but it is not state of the art.
- A single `ImagePrism` instance is not thread-safe; the embedding cache is unsynchronized. Use one instance per thread.

## When to use something else

If you need only one kind of similarity, reach for the specialized tool: imagehash for perceptual hashing, CLIP directly for semantic search, insightface for faces. imageprism is worth it when you need two or more of these behind one interface. It saves the integration work rather than trying to beat any of those libraries at their single job.

## License

MIT, see [LICENSE](LICENSE). Model weights download from their original sources under their own licenses.
