Metadata-Version: 2.4
Name: span_data
Version: 0.1.5
Summary: Identity resolution library built on Splink and DuckDB.
Author: The Data Loft
License-Expression: BSD-3-Clause
Project-URL: Community (Slack), https://join.slack.com/t/span-data/shared_invite/zt-3xkirzw8i-w_OcmdVoyAMLSH2V83gGAw
Project-URL: Documentation, https://span.gitbook.io/span-docs
Project-URL: Substack, https://cleanschema.substack.com/
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Requires-Python: <3.12,>=3.11
Description-Content-Type: text/markdown
License-File: span_data/packaging_info/LICENSE
Requires-Dist: duckdb<2.0.0,>=1.5.1
Requires-Dist: email-validator<3.0.0,>=1.3.1
Requires-Dist: pandas<3.0.0,>=2.3.3
Requires-Dist: phonenumbers<10.0.0,>=9.0.6
Requires-Dist: snowflake-connector-python[pandas]==4.4.0
Requires-Dist: splink<5.0.0,>=4.0.16
Dynamic: license-file

# SPAN

`span_data` is an identity resolution library built on Splink and DuckDB. It
links records that refer to the same entity and produces a unified ID graph.

**Early release.** This package is under active development, and the public API and import paths **may change** between releases. We expect to follow [semantic versioning](https://semver.org/) more strictly after a **1.0** stable baseline; until then, treat upgrades as potentially breaking. Pin a specific version in production and review `span_data/packaging_info/CHANGELOG.md` before upgrading.

## Core Features

- Ingest source data from pandas DataFrames, CSV files, and Snowflake tables or views.
- Clean and normalize source fields.
- Resolve entities into profile graphs with Splink-based matching.
- Export graph results to DataFrames, CSV, Parquet, DuckDB relations, and Snowflake.
- Apply manual override workflows for merge, split, and assign operations.

## Python Support

This package currently supports Python `>=3.11,<3.12`.

## Usage

See [SPAN Product Documentation](https://span.gitbook.io/span-docs) for usage instructions.

## License

This project is licensed under the BSD 3-Clause License. See `span_data/packaging_info/LICENSE`.

Third-party dependency license information is documented in `span_data/packaging_info/THIRD_PARTY_NOTICES.md`.

Release history is documented in `span_data/packaging_info/CHANGELOG.md`.
