Metadata-Version: 2.4
Name: liken
Version: 0.2.2
Summary: A Python library for near deduplication and record linkage.
License: Apache-2.0
License-File: LICENSE
License-File: NOTICE
Keywords: duplicates,deduplication,record linkage,canonicalization
Author: VictorAut
Requires-Python: >=3.11,<3.14
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Dist: datasketch (>=1.8.0,<2.0.0)
Requires-Dist: networkx (>=3.6.1,<4.0.0)
Requires-Dist: pandas (>=2.2.3,<3.0.0)
Requires-Dist: polars (>=1.24.0,<2.0.0)
Requires-Dist: pyspark (>=3.5.5,<4.0.0)
Requires-Dist: rapidfuzz (>=3.12.2,<4.0.0)
Requires-Dist: scikit-learn (>=1.6.1,<2.0.0)
Requires-Dist: sparse-dot-topn (>=1.1.5,<2.0.0)
Requires-Dist: typing-extensions (>=4.13.0,<5.0.0)
Project-URL: Documentation, https://victorautonell-oiry.me/liken/liken.html
Project-URL: Repository, https://github.com/VictorAut/liken/
Description-Content-Type: text/markdown

<p align="center">
<a href="https://pypi.python.org/pypi/liken"><img height="20" alt="PyPI Version" src="https://img.shields.io/pypi/v/liken"></a>
<img alt="PyPI - Python Version" src="https://img.shields.io/pypi/pyversions/liken">
</p>

# Introduction

**Liken** is a library providing enhanced deduplication tooling for DataFrames.

The key features are:

- Near deduplication
- Ready-to-use deduplication strategies
- Record linkage and canonicalization
- Rules-based deduplication
- Pandas, Polars and PySpark support
- Customizable in pure Python


## A flexible API

Checkout the [API Documentation](https://victorautonell-oiry.me/liken/)

## Installation

```shell
pip install liken
```

## Example

```python
from liken import Dedupe, fuzzy

lk = Dedupe(df)

lk.apply(fuzzy())

df = lk.drop_duplicates("address")
```

## License
This project is licensed under the [Apache-2.0 License](https://www.apache.org/licenses/LICENSE-2.0.html). See the [LICENSE](LICENSE) file for more details.
