Metadata-Version: 2.4
Name: denselinkage
Version: 1.0.0b2
Summary: Record linkage with dense blocking using text embeddings and LLM matching
Keywords: record-linkage,entity-resolution,embeddings,blocking,llm
Author: Alvaro
Author-email: Alvaro <alvarocarvalho@live.com>
License-Expression: MIT
License-File: LICENSE
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Classifier: Typing :: Typed
Requires-Dist: numpy>=1.24
Requires-Dist: pandas>=2.0
Requires-Dist: langchain-core>=0.3 ; extra == 'all'
Requires-Dist: langchain-openai>=0.2 ; extra == 'all'
Requires-Dist: faiss-cpu>=1.8 ; extra == 'all'
Requires-Dist: sentence-transformers>=3.0 ; extra == 'all'
Requires-Dist: mypy>=1.11 ; extra == 'dev'
Requires-Dist: ruff>=0.6 ; extra == 'dev'
Requires-Dist: pytest>=8 ; extra == 'dev'
Requires-Dist: pytest-cov>=5 ; extra == 'dev'
Requires-Dist: pandas-stubs>=2.0 ; extra == 'dev'
Requires-Dist: sphinx>=8 ; extra == 'docs'
Requires-Dist: furo>=2024.8 ; extra == 'docs'
Requires-Dist: myst-parser>=4 ; extra == 'docs'
Requires-Dist: sphinx-copybutton>=0.5 ; extra == 'docs'
Requires-Dist: sphinx-design>=0.6 ; extra == 'docs'
Requires-Dist: sphinxcontrib-mermaid>=0.9 ; extra == 'docs'
Requires-Dist: faiss-cpu>=1.8 ; extra == 'faiss'
Requires-Dist: langchain-core>=0.3 ; extra == 'langchain'
Requires-Dist: langchain-openai>=0.2 ; extra == 'langchain'
Requires-Dist: sentence-transformers>=3.0 ; extra == 'sentence-transformers'
Requires-Python: >=3.10
Project-URL: Homepage, https://github.com/caalvaro/denselinkage
Project-URL: Repository, https://github.com/caalvaro/denselinkage
Project-URL: Issues, https://github.com/caalvaro/denselinkage/issues
Provides-Extra: all
Provides-Extra: dev
Provides-Extra: docs
Provides-Extra: faiss
Provides-Extra: langchain
Provides-Extra: sentence-transformers
Provides-Extra: train
Description-Content-Type: text/markdown

# denselinkage

[![CI](https://github.com/caalvaro/denselinkage/actions/workflows/ci.yml/badge.svg)](https://github.com/caalvaro/denselinkage/actions/workflows/ci.yml)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

Record linkage with dense blocking using text embeddings and LLM matching.

> **Status — beta.** The dependency-free core is implemented and **runs**:
> `link` / `dedupe` / `match_pairs`, connected-components clustering, and the
> linkage / blocking / clustering metrics — all on numpy + pandas. The heavy
> extras (FAISS, sentence-transformers, LangChain) are **experimental this
> release**: their adapters are declared but raise `NotImplementedError`.

## Usage

```python
from denselinkage import DenseLinker, Source, TemplateSerializer
from denselinkage.core.results import LabeledPairs
from denselinkage.metrics import linkage_metrics

linker = DenseLinker.with_defaults()  # picks a sensible embedder/index/matcher
left  = Source(df_a, id_column="id_a", serializer=TemplateSerializer("Name: {name}, City: {city}"))
right = Source(df_b, id_column="id_b", serializer=TemplateSerializer(
    "Name: {name}, City: {city}", column_mapping={"company_name": "name", "headquarters": "city"}))

result  = linker.link(left, right)               # one call, no fit/predict, no mutation
metrics = linkage_metrics(result, gold=LabeledPairs.from_pairs([("A1", "B1")]))
result.to_frame()  # left_id, right_id, match, confidence, reason, similarity
```

Deduplicate one dataset with `linker.dedupe(src)`; reuse an index with
`idx = linker.index(left); idx.query(right)`. See [`examples/`](examples/) —
`00_quickstart.py` is the shortest path, `01_end_to_end_linkage.py` shows full
component control.

## Install

```bash
pip install denselinkage                       # core (numpy, pandas)
pip install "denselinkage[faiss]"              # + FAISS vector index
pip install "denselinkage[sentence-transformers]"
pip install "denselinkage[langchain]"          # + LLM matcher
pip install "denselinkage[all]"
```

> The `[faiss]`, `[sentence-transformers]`, and `[langchain]` extras are reserved
> but **experimental** this release — their adapters raise `NotImplementedError`;
> the dependency-free core runs without them.

## Development

Requires [uv](https://docs.astral.sh/uv/).

```bash
uv sync --dev
uv run ruff check . && uv run ruff format --check . && uv run mypy && uv run pytest
```

See [CONTRIBUTING.md](CONTRIBUTING.md) for details. CI runs lint, format,
strict mypy, and tests on Python 3.10–3.13.

## Changelog

See [CHANGELOG.md](CHANGELOG.md).

## License

[MIT](LICENSE) © 2026 Alvaro
