Metadata-Version: 2.4
Name: span_data
Version: 0.1.3
Summary: Identity resolution library built on Splink and DuckDB.
Author: The Data Loft
License-Expression: BSD-3-Clause
Project-URL: Community (Slack), https://join.slack.com/t/span-data/shared_invite/zt-3xkirzw8i-w_OcmdVoyAMLSH2V83gGAw
Project-URL: Documentation, https://span.gitbook.io/span-docs
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Requires-Python: <3.12,>=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: duckdb<2.0.0,>=1.5.1
Requires-Dist: email-validator<3.0.0,>=1.3.1
Requires-Dist: pandas<3.0.0,>=2.3.3
Requires-Dist: phonenumbers<10.0.0,>=9.0.6
Requires-Dist: snowflake-connector-python[pandas]==4.4.0
Requires-Dist: splink<5.0.0,>=4.0.16
Dynamic: license-file

# ID Grapher

`span_data` is an identity resolution library built on Splink and DuckDB. It
links records that refer to the same entity and produces a unified ID graph.

## Core Features

- Ingest source data from pandas DataFrames, CSV files, and Snowflake tables or
  views.
- Clean and normalize source fields using `FieldDefinitionMap`.
- Resolve entities into profile graphs with Splink-based matching.
- Export graph results to DataFrames, CSV, Parquet, DuckDB relations, and
  Snowflake.
- Apply manual override workflows for merge, split, and assign operations.

## Python Support

This package currently supports Python `>=3.11,<3.12`.

## Usage

See [SPAN Product Documentation](https://span.gitbook.io/span-docs) for usage instructions.

## License

This project is licensed under the BSD 3-Clause License. See `LICENSE`.

Third-party dependency license information is documented in
`THIRD_PARTY_NOTICES.md`.

Release history is documented in `CHANGELOG.md`.
