Metadata-Version: 2.4
Name: goldenpipe
Version: 1.2.0
Summary: Pluggable pipeline framework for data quality workflows
Project-URL: Homepage, https://github.com/benzsevern/goldenmatch
Project-URL: Repository, https://github.com/benzsevern/goldenmatch
Project-URL: Documentation, https://github.com/benzsevern/goldenmatch/tree/main/packages/python/goldenpipe#readme
Project-URL: Issues, https://github.com/benzsevern/goldenmatch/issues
Project-URL: Changelog, https://github.com/benzsevern/goldenmatch/blob/main/packages/python/goldenpipe/CHANGELOG.md
Project-URL: Author, https://bensevern.dev
Author-email: Ben Severn <ben@bensevern.dev>
License: MIT
License-File: LICENSE
Keywords: data-quality,data-transformation,data-validation,entity-resolution,orchestration,pipeline,polars
Classifier: Development Status :: 4 - Beta
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: goldencheck-types
Requires-Dist: infermap
Requires-Dist: polars>=1.0
Requires-Dist: pydantic>=2.7
Requires-Dist: pyyaml>=6.0
Requires-Dist: rich>=13.0
Requires-Dist: typer>=0.12
Provides-Extra: agent
Requires-Dist: aiohttp>=3.9; extra == 'agent'
Provides-Extra: all
Requires-Dist: aiohttp>=3.9; extra == 'all'
Requires-Dist: fastapi>=0.110; extra == 'all'
Requires-Dist: goldencheck>=0.5.0; extra == 'all'
Requires-Dist: goldenflow>=0.1.0; extra == 'all'
Requires-Dist: goldenmatch>=1.2.0; extra == 'all'
Requires-Dist: mcp>=1.0; extra == 'all'
Requires-Dist: textual>=1.0; extra == 'all'
Requires-Dist: uvicorn>=0.30; extra == 'all'
Provides-Extra: api
Requires-Dist: fastapi>=0.110; extra == 'api'
Requires-Dist: uvicorn>=0.30; extra == 'api'
Provides-Extra: check
Requires-Dist: goldencheck>=0.5.0; extra == 'check'
Provides-Extra: dev
Requires-Dist: httpx>=0.27; extra == 'dev'
Requires-Dist: pytest-aiohttp>=1.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest-cov>=5.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.6; extra == 'dev'
Provides-Extra: flow
Requires-Dist: goldenflow>=0.1.0; extra == 'flow'
Provides-Extra: golden-suite
Requires-Dist: goldencheck>=0.5.0; extra == 'golden-suite'
Requires-Dist: goldenflow>=0.1.0; extra == 'golden-suite'
Requires-Dist: goldenmatch>=1.2.0; extra == 'golden-suite'
Provides-Extra: match
Requires-Dist: goldenmatch>=1.2.0; extra == 'match'
Provides-Extra: mcp
Requires-Dist: mcp>=1.0; extra == 'mcp'
Provides-Extra: tui
Requires-Dist: textual>=1.0; extra == 'tui'
Description-Content-Type: text/markdown

<!-- mcp-name: io.github.benzsevern/goldenpipe -->
# GoldenPipe

**Golden Suite orchestrator** -- Check quality, fix issues, deduplicate records. One command.
Built by [Ben Severn](https://bensevern.dev).

[![PyPI](https://img.shields.io/pypi/v/goldenpipe?color=d4a017)](https://pypi.org/project/goldenpipe/)
[![CI](https://github.com/benzsevern/goldenpipe/actions/workflows/test.yml/badge.svg)](https://github.com/benzsevern/goldenpipe/actions/workflows/test.yml)
[![codecov](https://codecov.io/gh/benzsevern/goldenpipe/graph/badge.svg)](https://codecov.io/gh/benzsevern/goldenpipe)
[![Downloads](https://static.pepy.tech/badge/goldenpipe/month)](https://pepy.tech/project/goldenpipe)
[![Python 3.11+](https://img.shields.io/badge/python-3.11%2B-blue)](https://python.org)
[![License: MIT](https://img.shields.io/badge/license-MIT-green)](LICENSE)
[![Docs](https://img.shields.io/badge/docs-benzsevern.github.io%2Fgoldenpipe-d4a017)](https://benzsevern.github.io/goldenpipe/)
[![DQBench Pipeline](https://img.shields.io/badge/DQBench%20Pipeline-88.07-gold)](https://github.com/benzsevern/dqbench)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/benzsevern/goldenpipe/blob/main/scripts/goldenpipe_demo.ipynb)

## What It Does

```
Raw Data
  | GoldenCheck   -- profile & discover quality issues
  | GoldenFlow    -- fix issues, standardize, reshape
  | GoldenMatch   -- deduplicate, match, create golden records
  v
Golden Records
```

GoldenPipe orchestrates the full pipeline with adaptive logic:
- **Skips** transformation if no quality issues found
- **Routes** to privacy-preserving matching if sensitive fields detected
- **Reports** reasoning for every decision

## Install

```bash
pip install goldenpipe
```

## Quick Start

```python
import goldenpipe as gp

result = gp.run("customers.csv")

print(result.status)        # "success"
print(result.check)         # Quality findings
print(result.transform)     # What was fixed
print(result.match)         # Deduplicated clusters
print(result.reasoning)     # Why each decision was made
```

## CLI

```bash
goldenpipe run customers.csv                # Full pipeline
goldenpipe run customers.csv --verbose      # Show reasoning
goldenpipe run customers.csv --skip-flow    # Check + Match only
goldenpipe run customers.csv --strategy pprl  # Force privacy mode
goldenpipe run customers.csv -o golden.csv  # Save golden records
```

## Remote MCP Server

GoldenPipe is available as a hosted MCP server on [Smithery](https://smithery.ai/servers/benzsevern/goldenpipe) — connect from any MCP client without installing anything.

**Claude Desktop / Claude Code:**
```json
{
  "mcpServers": {
    "goldenpipe": {
      "url": "https://goldenpipe-mcp-production.up.railway.app/mcp/"
    }
  }
}
```

**Local server:**
```bash
pip install goldenpipe[mcp]
goldenpipe mcp-serve
```

4 tools available: list pipeline stages, validate wiring, run full check-transform-match pipeline, explain configs.

## Part of the Golden Suite

| Tool | Purpose | Install |
|------|---------|---------|
| [GoldenCheck](https://github.com/benzsevern/goldencheck) | Validate & profile data quality | `pip install goldencheck` |
| [GoldenFlow](https://github.com/benzsevern/goldenflow) | Transform & standardize data | `pip install goldenflow` |
| [GoldenMatch](https://github.com/benzsevern/goldenmatch) | Deduplicate & match records | `pip install goldenmatch` |
| [GoldenPipe](https://github.com/benzsevern/goldenpipe) | Orchestrate the full pipeline | `pip install goldenpipe` |

## Author

[Ben Severn](https://bensevern.dev)

## License

MIT
