Metadata-Version: 2.4
Name: liken
Version: 0.7.0
Summary: A Python library for near deduplication and record linkage.
Keywords: duplicates,deduplication,record linkage,canonicalization
Author: VictorAut
License-Expression: Apache-2.0
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3
Requires-Dist: pandas>=1,<4
Requires-Dist: polars>=1.24.0,<2
Requires-Dist: rapidfuzz>=3.12.2,<4
Requires-Dist: scikit-learn>=1.6.1,<2
Requires-Dist: sparse-dot-topn>=1.1.5,<2
Requires-Dist: typing-extensions>=4.13.0,<5
Requires-Dist: pyspark>=4
Requires-Dist: datasketch>=1.8.0,<2
Requires-Dist: networkx>=3.6.1,<4
Requires-Dist: pyarrow>=23.0.1,<24
Requires-Dist: faker>=37.12.0
Requires-Dist: catalogue>=2.0.10
Requires-Dist: nameparser>=1.1.3
Requires-Dist: cleanco>=2.3
Requires-Dist: nltk>=3.9.3
Requires-Python: >=3.11
Project-URL: Repository, https://github.com/VictorAut/liken/
Project-URL: Documentation, https://victoraut.github.io/liken/
Description-Content-Type: text/markdown

<p align="center">
<a href="https://pypi.python.org/pypi/liken"><img height="20" alt="PyPI Version" src="https://img.shields.io/pypi/v/liken"></a>
<img alt="PyPI - Python Version" src="https://img.shields.io/pypi/pyversions/liken">
<img height="20" alt="PyPI Downloads" src="https://static.pepy.tech/badge/liken">
<img height="20" alt="Tests" src="https://img.shields.io/github/actions/workflow/status/VictorAut/liken/python-validation.yml?label=CI">
<img height="20" alt="Coverage" src="https://img.shields.io/codecov/c/github/VictorAut/liken">
<img height="20" alt="License" src="https://img.shields.io/github/license/VictorAut/liken">
</p>

# Introduction

**Liken** is a library providing enhanced deduplication tooling for DataFrames.

The key features are:

- Near deduplication
- Ready-to-use deduplication methods
- Record linkage and canonicalization
- Rules-based deduplication
- Pandas, Polars and PySpark support
- Customizable in pure Python


## A flexible API

Checkout the [API Documentation](https://victoraut.github.io/liken/)

## Installation

```shell
pip install liken
```

## Example

```python
import liken as lk

df = lk.dedupe(df).apply(lk.fuzzy()).drop_duplicates("address").collect()
```

## License
This project is licensed under the [Apache-2.0 License](https://www.apache.org/licenses/LICENSE-2.0.html). See the [LICENSE](LICENSE) file for more details.