Metadata-Version: 2.4
Name: plato-semantic-sim
Version: 0.1.1
Summary: PLATO semantic similarity — embedding-based tile comparison and deduplication
License: MIT
Project-URL: Homepage, https://cocapn.github.io
Project-URL: Repository, https://github.com/cocapn/plato-semantic-sim
Keywords: plato,semantic,similarity,deduplication,embeddings
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file

# 🔬 Plato Semantic Sim

> Embedding-based tile comparison and deduplication for PLATO rooms

Computes similarity between knowledge tiles using cosine similarity, Jaccard index, and edit distance. Finds duplicates via content hashing (exact) and vector comparison (semantic).

## Install

```bash
pip install plato-semantic-sim
```

## Quick Start

```python
from plato_semantic_sim import SimilarityEngine, DedupEngine

engine = SimilarityEngine()
scores = engine.find_similar([0.1, 0.2, 0.3], {"a": [0.1, 0.2, 0.3], "b": [0.9, 0.8, 0.7]})
print(scores)  # [("a", 1.0)]

dedup = DedupEngine(similarity_threshold=0.85)
result = dedup.add("tile-1", "knowledge content", [0.1, 0.2, 0.3])
print(result)  # {"status": "unique"}
```

## API

| Class | Purpose |
|-------|---------|
| `SimilarityEngine` | Compare vectors, find similar, pairwise matrix |
| `CosineSimilarity` | Dot product / magnitude |
| `JaccardSimilarity` | Token overlap ratio |
| `DedupEngine` | Exact + semantic dedup in one pass |

## Part of [Cocapn](https://github.com/cocapn) · Agent Infrastructure
