Metadata-Version: 2.4
Name: record-linker
Version: 0.1.0
Summary: Lightweight, pandas-native probabilistic record deduplication and entity resolution
License: MIT
License-File: LICENSE
Keywords: deduplication,record-linkage,entity-resolution,fuzzy-matching,pandas
Author: Record Linker Contributors
Author-email: hello@record-linker.dev
Requires-Python: >=3.9,<4.0
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Provides-Extra: parquet
Requires-Dist: click (>=8.0)
Requires-Dist: jellyfish (>=1.0)
Requires-Dist: numpy (>=1.23)
Requires-Dist: pandas (>=1.5)
Requires-Dist: pyarrow (>=10.0) ; extra == "parquet"
Requires-Dist: pyyaml (>=6.0)
Requires-Dist: rapidfuzz (>=3.0)
Project-URL: Documentation, https://github.com/record-linker/record-linker#readme
Project-URL: Homepage, https://github.com/record-linker/record-linker
Project-URL: Repository, https://github.com/record-linker/record-linker
Description-Content-Type: text/markdown

# record-linker

[![PyPI version](https://img.shields.io/pypi/v/record-linker.svg)](https://pypi.org/project/record-linker/)
[![Python Versions](https://img.shields.io/pypi/pyversions/record-linker.svg)](https://pypi.org/project/record-linker/)
[![CI](https://github.com/record-linker/record-linker/actions/workflows/ci.yml/badge.svg)](https://github.com/record-linker/record-linker/actions/workflows/ci.yml)
[![Coverage](https://img.shields.io/badge/coverage-%E2%89%A580%25-brightgreen)](https://github.com/record-linker/record-linker)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)

**`record-linker`** is a lightweight, pandas-native Python library for probabilistic record deduplication and entity resolution — no training data required, no steep configuration overhead, works out of the box. Point it at a messy CSV of missing persons, voter rolls, or beneficiary records and get back a clean DataFrame annotated with cluster IDs and match confidence scores.

---

## What It Does

`record-linker` finds duplicate records in real-world datasets where names are misspelled, dates are formatted differently, and addresses are abbreviated. It uses configurable **blocking rules** to avoid comparing every pair of records, **fuzzy comparators** to measure field-level similarity, and **connected-components clustering** to group likely duplicates — all returning a confidence-scored, auditable result.

**Motivating example:** A humanitarian NGO receives beneficiary lists from three field offices. Each office uses different name spellings and date formats. `record-linker` deduplicates the merged list in seconds, flagging 847 duplicate registrations out of 12,000 records.

---

## Installation


