Metadata-Version: 2.3
Name: matchescu-reference-extraction
Version: 0.9.3
Summary: Extract references from a multitude of data sources
License: MIT
Author: Andrei Olar
Author-email: andrei.olar@samlex.ro
Requires-Python: >=3.13, !=2.7.*, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, !=3.5.*, !=3.6.*, !=3.7.*, !=3.8.*, !=3.9.*, !=3.10.*, !=3.11.*, !=3.12.*
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.13
Requires-Dist: matchescu-base (>=0.27.0,<0.28.0)
Requires-Dist: polars (>=1.38.1,<2.0.0)
Description-Content-Type: text/markdown

from matchescu.data_sources import CsvDataSource

# matchescu-reference-extraction

This package implements an entity reference extraction subsystem for entity
resolution.
The main concepts that are relevant here are:

* a generic attribute-based data record implementation (can access data by `str`
or `int` key),
* various `data_sources` which support reading records from different data
stores, and
* generic `extraction_engines` that convert data records to entity references.

# Development

Run the following commands to ensure you have a proper environment.

```shell
$ pyenv install 3.12
$ poetry install
$ poetry run pytest
```

When you contribute code, open a new `feature/*` or `hotfix/*` branch.

# Usage

```python
from matchescu.data_sources import CsvDataSource
from matchescu.extraction import Traits

traits = list(
    Traits().int(["id"])
    .string(["name", "description", "manufacturer"])
    .currency(["price"])
)
csv = CsvDataSource("./path/to/csv/file", list(traits))

```
