Metadata-Version: 2.4
Name: icdr
Version: 0.0.12
Summary: DESCRIPTION
Home-page: https://github.com/lakritidis/icdr
Author: Leonidas Akritidis
Author-email: Leonidas Akritidis <lakritidis@ihu.gr>
Maintainer: Leonidas Akritidis
Maintainer-email: Leonidas Akritidis <lakritidis@ihu.gr>
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/lakritidis/icdr
Project-URL: Issues, https://github.com/lakritidis/icdr/issues
Keywords: index,inverted index,contrastive data,pairs,information retrieval,similarity search,search,string search,approximate retrieval
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: maintainer

# ICDR
Contrastive Data Retrieval with Inverted Indexes

Efficient Approximate/Precise retrieval of similar documents for fine-tuning language models. The library can be used to quickly create contrastive pairs/triplets from large document collections. 

ICDR builds an inverted index structure and several fast look-up tables with the aim of retrieving similar texts from a corpus. The library is ideal for efficient entity matching, entity resolution, record linkage, and deduplication applications in the NLP realm. ICDR allows for very fast retrieval of similar, positive (i.e. matching), and negative (i.e. non-matching) text samples which can be used either directly, or to fine-tune LLMs and other models.
