Metadata-Version: 2.4
Name: pirn
Version: 0.8.0
Summary: A pipeline framework where everything is a knot.
Author: snoodleboot, LLC
License: Apache-2.0
License-File: LICENSE
Keywords: async,dag,graph,lineage,pipeline,workflow
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.11
Requires-Dist: cloudpickle>=3.0
Requires-Dist: numpy>=2.4.4
Requires-Dist: pydantic>=2.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: sweet-tea>=0.2.46
Provides-Extra: agents
Provides-Extra: airbyte
Requires-Dist: httpx>=0.26; extra == 'airbyte'
Provides-Extra: alation
Requires-Dist: httpx>=0.26; extra == 'alation'
Provides-Extra: all
Requires-Dist: aio-pika>=9.0; extra == 'all'
Requires-Dist: aioboto3>=12.0; extra == 'all'
Requires-Dist: aiokafka>=0.13.0; extra == 'all'
Requires-Dist: aiomysql>=0.2; extra == 'all'
Requires-Dist: aioodbc>=0.5; extra == 'all'
Requires-Dist: aiosqlite>=0.19; extra == 'all'
Requires-Dist: asdf>=5.3.0; extra == 'all'
Requires-Dist: astropy>=7.2.0; extra == 'all'
Requires-Dist: asyncpg>=0.31; extra == 'all'
Requires-Dist: atlassian-python-api>=3.0; extra == 'all'
Requires-Dist: azure-servicebus>=7.0; extra == 'all'
Requires-Dist: azure-storage-blob>=12.0; extra == 'all'
Requires-Dist: celery>=5.3; extra == 'all'
Requires-Dist: cfgrib>=0.9.15.1; extra == 'all'
Requires-Dist: clickhouse-connect>=0.7; extra == 'all'
Requires-Dist: dask[distributed]>=2024.1; extra == 'all'
Requires-Dist: databricks-sql-connector>=3.0; extra == 'all'
Requires-Dist: datadog-api-client>=2.0; extra == 'all'
Requires-Dist: dipy>=1.9; extra == 'all'
Requires-Dist: dlisio>=1.0.4; extra == 'all'
Requires-Dist: duckdb>=0.10; extra == 'all'
Requires-Dist: emd-signal>=1.6; extra == 'all'
Requires-Dist: fhir-resources>=7.1; extra == 'all'
Requires-Dist: fiona>=1.10.1; extra == 'all'
Requires-Dist: gcloud-aio-storage>=9.0; extra == 'all'
Requires-Dist: google-cloud-bigquery>=3.0; extra == 'all'
Requires-Dist: google-cloud-pubsub>=2.0; extra == 'all'
Requires-Dist: hl7>=0.4.5; extra == 'all'
Requires-Dist: httpx>=0.26; extra == 'all'
Requires-Dist: hubspot-api-client>=8.0; extra == 'all'
Requires-Dist: lasio>=0.32; extra == 'all'
Requires-Dist: librosa>=0.10; extra == 'all'
Requires-Dist: mne>=1.6; extra == 'all'
Requires-Dist: nibabel>=5.2; extra == 'all'
Requires-Dist: numpy>=1.26; extra == 'all'
Requires-Dist: opentelemetry-api>=1.20; extra == 'all'
Requires-Dist: opentelemetry-sdk>=1.20; extra == 'all'
Requires-Dist: pandas>=2.0; extra == 'all'
Requires-Dist: pillow-heif>=1.3.0; extra == 'all'
Requires-Dist: prometheus-client>=0.19; extra == 'all'
Requires-Dist: pyarrow>=14.0; extra == 'all'
Requires-Dist: pybids>=0.17; extra == 'all'
Requires-Dist: pydicom>=2.4; extra == 'all'
Requires-Dist: pydub>=0.25.1; extra == 'all'
Requires-Dist: pyfaidx>=0.7; extra == 'all'
Requires-Dist: pygithub>=2.0; extra == 'all'
Requires-Dist: pyreadstat>=1.3.4; extra == 'all'
Requires-Dist: pysam>=0.22; extra == 'all'
Requires-Dist: pyteomics>=4.7; extra == 'all'
Requires-Dist: pywavelets>=1.5; extra == 'all'
Requires-Dist: ray>=2.9; extra == 'all'
Requires-Dist: resfo>=1.3; extra == 'all'
Requires-Dist: scikit-learn>=1.4; extra == 'all'
Requires-Dist: scipy>=1.12; extra == 'all'
Requires-Dist: segyio>=1.9.14; extra == 'all'
Requires-Dist: shopifyapi>=12.0; extra == 'all'
Requires-Dist: simple-salesforce>=1.12; extra == 'all'
Requires-Dist: simpleitk>=2.3; extra == 'all'
Requires-Dist: snowflake-connector-python>=3.0; extra == 'all'
Requires-Dist: starlette>=0.36; extra == 'all'
Requires-Dist: stripe>=7.0; extra == 'all'
Requires-Dist: tifffile>=2026.3.3; extra == 'all'
Requires-Dist: uproot>=5.7.4; extra == 'all'
Requires-Dist: valkey-glide>=2.3; extra == 'all'
Provides-Extra: all-db
Requires-Dist: aiomysql>=0.2; extra == 'all-db'
Requires-Dist: aioodbc>=0.5; extra == 'all-db'
Requires-Dist: aiosqlite>=0.19; extra == 'all-db'
Requires-Dist: asyncpg>=0.29; extra == 'all-db'
Requires-Dist: clickhouse-connect>=0.7; extra == 'all-db'
Requires-Dist: databricks-sql-connector>=3.0; extra == 'all-db'
Requires-Dist: duckdb>=0.10; extra == 'all-db'
Requires-Dist: google-cloud-bigquery>=3.0; extra == 'all-db'
Requires-Dist: snowflake-connector-python>=3.0; extra == 'all-db'
Provides-Extra: all-domains
Requires-Dist: dipy>=1.9; extra == 'all-domains'
Requires-Dist: emd-signal>=1.6; extra == 'all-domains'
Requires-Dist: fhir-resources>=7.1; extra == 'all-domains'
Requires-Dist: lasio>=0.31; extra == 'all-domains'
Requires-Dist: librosa>=0.10; extra == 'all-domains'
Requires-Dist: mne>=1.6; extra == 'all-domains'
Requires-Dist: nibabel>=5.2; extra == 'all-domains'
Requires-Dist: numpy>=1.26; extra == 'all-domains'
Requires-Dist: pandas>=2.0; extra == 'all-domains'
Requires-Dist: pyarrow>=14.0; extra == 'all-domains'
Requires-Dist: pybids>=0.17; extra == 'all-domains'
Requires-Dist: pydicom>=2.4; extra == 'all-domains'
Requires-Dist: pyfaidx>=0.7; extra == 'all-domains'
Requires-Dist: pysam>=0.22; extra == 'all-domains'
Requires-Dist: pyteomics>=4.7; extra == 'all-domains'
Requires-Dist: pywavelets>=1.5; extra == 'all-domains'
Requires-Dist: resfo>=1.3; extra == 'all-domains'
Requires-Dist: scikit-learn>=1.4; extra == 'all-domains'
Requires-Dist: scipy>=1.12; extra == 'all-domains'
Requires-Dist: segyio>=1.9; extra == 'all-domains'
Requires-Dist: simpleitk>=2.3; extra == 'all-domains'
Provides-Extra: all-frames
Requires-Dist: datafusion>=40.0; extra == 'all-frames'
Requires-Dist: duckdb>=0.10; extra == 'all-frames'
Requires-Dist: pandas>=2.0; extra == 'all-frames'
Requires-Dist: polars>=0.20; extra == 'all-frames'
Requires-Dist: pyarrow>=14.0; extra == 'all-frames'
Provides-Extra: all-lazy
Requires-Dist: dask[distributed]>=2024.1; extra == 'all-lazy'
Requires-Dist: ibis-framework>=9.0; extra == 'all-lazy'
Requires-Dist: pyspark>=4.0; extra == 'all-lazy'
Requires-Dist: ray[data]>=2.10; extra == 'all-lazy'
Provides-Extra: all-observability
Requires-Dist: datadog-api-client>=2.0; extra == 'all-observability'
Requires-Dist: opentelemetry-api>=1.20; extra == 'all-observability'
Requires-Dist: opentelemetry-sdk>=1.20; extra == 'all-observability'
Requires-Dist: prometheus-client>=0.19; extra == 'all-observability'
Provides-Extra: all-saas
Requires-Dist: atlassian-python-api>=3.0; extra == 'all-saas'
Requires-Dist: hubspot-api-client>=8.0; extra == 'all-saas'
Requires-Dist: pygithub>=2.0; extra == 'all-saas'
Requires-Dist: shopifyapi>=12.0; extra == 'all-saas'
Requires-Dist: simple-salesforce>=1.12; extra == 'all-saas'
Requires-Dist: stripe>=7.0; extra == 'all-saas'
Provides-Extra: all-storage
Requires-Dist: aioboto3>=12.0; extra == 'all-storage'
Requires-Dist: azure-storage-blob>=12.0; extra == 'all-storage'
Requires-Dist: gcloud-aio-storage>=9.0; extra == 'all-storage'
Provides-Extra: all-stream
Requires-Dist: aio-pika>=9.0; extra == 'all-stream'
Requires-Dist: aiokafka>=0.13.0; extra == 'all-stream'
Requires-Dist: azure-servicebus>=7.0; extra == 'all-stream'
Requires-Dist: google-cloud-pubsub>=2.0; extra == 'all-stream'
Requires-Dist: valkey-glide>=2.0; extra == 'all-stream'
Provides-Extra: amplitude
Requires-Dist: amplitude-analytics>=1.1; extra == 'amplitude'
Provides-Extra: arrow
Requires-Dist: pyarrow>=14.0; extra == 'arrow'
Provides-Extra: asdf
Requires-Dist: asdf>=5.3.0; extra == 'asdf'
Provides-Extra: audio
Requires-Dist: pydub>=0.25.1; extra == 'audio'
Provides-Extra: avro
Requires-Dist: fastavro>=1.9; extra == 'avro'
Provides-Extra: awkward
Requires-Dist: awkward>=2.5; extra == 'awkward'
Provides-Extra: azure
Requires-Dist: azure-storage-blob>=12.0; extra == 'azure'
Provides-Extra: azure-servicebus
Requires-Dist: azure-servicebus>=7.0; extra == 'azure-servicebus'
Provides-Extra: bids
Requires-Dist: pybids>=0.17; extra == 'bids'
Provides-Extra: bigquery
Requires-Dist: google-cloud-bigquery>=3.0; extra == 'bigquery'
Provides-Extra: bytewax
Requires-Dist: bytewax>=0.21; (python_version < '3.13') and extra == 'bytewax'
Provides-Extra: celery
Requires-Dist: celery>=5.3; extra == 'celery'
Requires-Dist: redis>=5.0; extra == 'celery'
Provides-Extra: clickhouse
Requires-Dist: clickhouse-connect>=0.7; extra == 'clickhouse'
Provides-Extra: dask
Requires-Dist: dask[distributed]>=2024.1; extra == 'dask'
Provides-Extra: data
Requires-Dist: pandas>=2.0; extra == 'data'
Requires-Dist: pyarrow>=14.0; extra == 'data'
Provides-Extra: databricks
Requires-Dist: databricks-sql-connector>=3.0; extra == 'databricks'
Provides-Extra: datadog
Requires-Dist: datadog-api-client>=2.0; extra == 'datadog'
Provides-Extra: datafusion
Requires-Dist: datafusion>=40.0; extra == 'datafusion'
Provides-Extra: datahub
Requires-Dist: httpx>=0.26; extra == 'datahub'
Provides-Extra: delta
Requires-Dist: deltalake>=0.17; extra == 'delta'
Provides-Extra: dev
Requires-Dist: cyclonedx-bom>=4.0; extra == 'dev'
Requires-Dist: mutmut<4,>=2.5; extra == 'dev'
Requires-Dist: pyright>=1.1; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.5; extra == 'dev'
Provides-Extra: dlis
Requires-Dist: dlisio>=1.0.4; extra == 'dlis'
Provides-Extra: docs
Requires-Dist: mkdocs-section-index>=0.3; extra == 'docs'
Requires-Dist: mkdocs>=1.6; extra == 'docs'
Requires-Dist: mkdocstrings[python]>=0.25; extra == 'docs'
Requires-Dist: playwright>=1.44; extra == 'docs'
Requires-Dist: pymdown-extensions>=10.0; extra == 'docs'
Provides-Extra: docx
Requires-Dist: python-docx>=1.1; extra == 'docx'
Provides-Extra: duckdb
Requires-Dist: duckdb>=0.10; extra == 'duckdb'
Provides-Extra: eland
Requires-Dist: eland>=8.13; extra == 'eland'
Provides-Extra: emd
Requires-Dist: emd-signal>=1.6; extra == 'emd'
Requires-Dist: scipy>=1.12; extra == 'emd'
Provides-Extra: epub
Requires-Dist: ebooklib>=0.18; extra == 'epub'
Provides-Extra: feather
Requires-Dist: pyarrow>=14.0; extra == 'feather'
Provides-Extra: fits
Requires-Dist: astropy>=7.2.0; extra == 'fits'
Provides-Extra: fivetran
Requires-Dist: httpx>=0.26; extra == 'fivetran'
Provides-Extra: gcs
Requires-Dist: gcloud-aio-storage>=9.0; extra == 'gcs'
Provides-Extra: genomics
Requires-Dist: pyfaidx>=0.7; extra == 'genomics'
Requires-Dist: pysam>=0.22; extra == 'genomics'
Provides-Extra: geojson
Requires-Dist: geojson>=3.1; extra == 'geojson'
Provides-Extra: geopackage
Requires-Dist: fiona>=1.10.1; extra == 'geopackage'
Provides-Extra: geotiff
Requires-Dist: rasterio>=1.3; extra == 'geotiff'
Provides-Extra: gguf
Requires-Dist: gguf>=0.10; extra == 'gguf'
Provides-Extra: github
Requires-Dist: pygithub>=2.0; extra == 'github'
Provides-Extra: google-analytics
Requires-Dist: google-analytics-data>=0.18; extra == 'google-analytics'
Provides-Extra: grafana
Requires-Dist: httpx>=0.26; extra == 'grafana'
Provides-Extra: great-expectations
Requires-Dist: great-expectations>=0.18; extra == 'great-expectations'
Provides-Extra: grib
Requires-Dist: cfgrib>=0.9.15.1; extra == 'grib'
Provides-Extra: hdf5
Requires-Dist: h5py>=3.10; extra == 'hdf5'
Provides-Extra: health
Requires-Dist: fhir-resources>=7.1; extra == 'health'
Requires-Dist: mne>=1.6; extra == 'health'
Requires-Dist: nibabel>=5.2; extra == 'health'
Requires-Dist: pydicom>=2.4; extra == 'health'
Requires-Dist: pyedflib>=0.1.42; extra == 'health'
Requires-Dist: pyfaidx>=0.7; extra == 'health'
Requires-Dist: pysam>=0.22; extra == 'health'
Provides-Extra: heic
Requires-Dist: pillow-heif>=1.3.0; extra == 'heic'
Provides-Extra: hl7
Requires-Dist: hl7>=0.4.5; extra == 'hl7'
Provides-Extra: html
Requires-Dist: beautifulsoup4>=4.12; extra == 'html'
Requires-Dist: lxml>=5.0; extra == 'html'
Provides-Extra: http
Requires-Dist: httpx>=0.26; extra == 'http'
Requires-Dist: starlette>=0.36; extra == 'http'
Provides-Extra: hubspot
Requires-Dist: hubspot-api-client>=8.0; extra == 'hubspot'
Provides-Extra: hudi
Provides-Extra: ibis
Requires-Dist: ibis-framework>=9.0; extra == 'ibis'
Requires-Dist: pyarrow-hotfix>=0.7; extra == 'ibis'
Provides-Extra: iceberg
Requires-Dist: pyiceberg>=0.6; extra == 'iceberg'
Provides-Extra: image
Requires-Dist: pillow>=10.0; extra == 'image'
Provides-Extra: jira
Requires-Dist: atlassian-python-api>=3.0; extra == 'jira'
Provides-Extra: joblib
Requires-Dist: joblib>=1.4; extra == 'joblib'
Provides-Extra: kafka
Requires-Dist: aiokafka>=0.13.0; extra == 'kafka'
Provides-Extra: kinesis
Requires-Dist: aioboto3>=12.0; extra == 'kinesis'
Provides-Extra: kml
Requires-Dist: lxml>=5.0; extra == 'kml'
Requires-Dist: simplekml>=1.3; extra == 'kml'
Provides-Extra: lance
Requires-Dist: lance>=1.0; extra == 'lance'
Provides-Extra: lz4
Requires-Dist: lz4>=4.3; extra == 'lz4'
Provides-Extra: markdown
Requires-Dist: markdown-it-py>=3.0; extra == 'markdown'
Requires-Dist: markdown>=3.5; extra == 'markdown'
Provides-Extra: matlab
Requires-Dist: scipy>=1.12; extra == 'matlab'
Provides-Extra: mixpanel
Requires-Dist: mixpanel>=4.10; extra == 'mixpanel'
Provides-Extra: ml
Requires-Dist: numpy>=1.26; extra == 'ml'
Requires-Dist: pandas>=2.0; extra == 'ml'
Requires-Dist: scikit-learn>=1.4; extra == 'ml'
Provides-Extra: modin
Requires-Dist: modin>=0.30; (python_version < '3.14') and extra == 'modin'
Provides-Extra: mri
Requires-Dist: dipy>=1.9; extra == 'mri'
Requires-Dist: nibabel>=5.2; extra == 'mri'
Requires-Dist: simpleitk>=2.3; extra == 'mri'
Provides-Extra: mssql
Requires-Dist: aioodbc>=0.5; extra == 'mssql'
Provides-Extra: mysql
Requires-Dist: aiomysql>=0.2; extra == 'mysql'
Provides-Extra: netcdf
Requires-Dist: netcdf4>=1.6; extra == 'netcdf'
Provides-Extra: ods
Requires-Dist: odfpy>=1.4; extra == 'ods'
Provides-Extra: oilgas
Requires-Dist: lasio>=0.32; extra == 'oilgas'
Requires-Dist: resfo>=1.3; extra == 'oilgas'
Requires-Dist: segyio>=1.9.14; extra == 'oilgas'
Provides-Extra: onnx
Requires-Dist: onnx>=1.16; extra == 'onnx'
Provides-Extra: open-metadata
Requires-Dist: httpx>=0.26; extra == 'open-metadata'
Provides-Extra: oracle
Requires-Dist: oracledb>=2.0; extra == 'oracle'
Provides-Extra: orc
Requires-Dist: pyarrow>=14.0; extra == 'orc'
Provides-Extra: otel
Requires-Dist: opentelemetry-api>=1.20; extra == 'otel'
Requires-Dist: opentelemetry-sdk>=1.20; extra == 'otel'
Provides-Extra: pandera
Requires-Dist: pandera>=0.20; extra == 'pandera'
Provides-Extra: parquet
Requires-Dist: pyarrow>=14.0; extra == 'parquet'
Provides-Extra: pathway
Requires-Dist: pathway>=0.13; (python_version < '3.11') and extra == 'pathway'
Provides-Extra: pdf
Requires-Dist: pypdf>=4.0; extra == 'pdf'
Requires-Dist: reportlab>=4.0; extra == 'pdf'
Provides-Extra: polars
Requires-Dist: polars>=0.20; extra == 'polars'
Provides-Extra: postgres
Requires-Dist: asyncpg>=0.31; extra == 'postgres'
Provides-Extra: pptx
Requires-Dist: python-pptx>=0.6; extra == 'pptx'
Provides-Extra: prometheus
Requires-Dist: prometheus-client>=0.19; extra == 'prometheus'
Provides-Extra: pubsub
Requires-Dist: google-cloud-pubsub>=2.0; extra == 'pubsub'
Provides-Extra: pyteomics
Requires-Dist: pyteomics>=4.7; extra == 'pyteomics'
Provides-Extra: pytorch
Requires-Dist: torch>=2.0; extra == 'pytorch'
Provides-Extra: rabbitmq
Requires-Dist: aio-pika>=9.0; extra == 'rabbitmq'
Provides-Extra: ray
Requires-Dist: ray>=2.9; extra == 'ray'
Provides-Extra: ray-data
Requires-Dist: ray[data]>=2.10; extra == 'ray-data'
Provides-Extra: redshift
Requires-Dist: asyncpg>=0.29; extra == 'redshift'
Provides-Extra: root
Requires-Dist: uproot>=5.7.4; extra == 'root'
Provides-Extra: rtf
Requires-Dist: striprtf>=0.0.27; extra == 'rtf'
Provides-Extra: s3
Requires-Dist: aioboto3>=12.0; extra == 's3'
Provides-Extra: safetensors
Requires-Dist: numpy>=1.26; extra == 'safetensors'
Requires-Dist: safetensors>=0.4; extra == 'safetensors'
Provides-Extra: salesforce
Requires-Dist: simple-salesforce>=1.12; extra == 'salesforce'
Provides-Extra: shapefile
Requires-Dist: pyshp>=2.3; extra == 'shapefile'
Provides-Extra: shopify
Requires-Dist: shopifyapi>=12.0; extra == 'shopify'
Provides-Extra: signal
Requires-Dist: librosa>=0.10; extra == 'signal'
Requires-Dist: pywavelets>=1.5; extra == 'signal'
Requires-Dist: scipy>=1.12; extra == 'signal'
Provides-Extra: snappy
Requires-Dist: python-snappy>=0.7; extra == 'snappy'
Provides-Extra: snowflake
Requires-Dist: snowflake-connector-python>=3.0; extra == 'snowflake'
Provides-Extra: spark
Requires-Dist: pyspark>=4.0; extra == 'spark'
Provides-Extra: spss
Requires-Dist: pyreadstat>=1.3.4; extra == 'spss'
Provides-Extra: sqlite
Requires-Dist: aiosqlite>=0.19; extra == 'sqlite'
Provides-Extra: stripe
Requires-Dist: stripe>=7.0; extra == 'stripe'
Provides-Extra: tensorflow
Requires-Dist: tensorflow>=2.21; (python_version < '3.14') and extra == 'tensorflow'
Provides-Extra: tflite
Requires-Dist: ai-edge-litert>=2.1; (python_version < '3.14') and extra == 'tflite'
Provides-Extra: tiff
Requires-Dist: imagecodecs>=2026.3.6; extra == 'tiff'
Requires-Dist: pillow>=10.0; extra == 'tiff'
Requires-Dist: tifffile>=2026.3.3; extra == 'tiff'
Provides-Extra: twilio
Requires-Dist: twilio>=8.0; extra == 'twilio'
Provides-Extra: valkey
Requires-Dist: valkey-glide>=2.3; extra == 'valkey'
Provides-Extra: xarray
Requires-Dist: xarray>=2024.0; extra == 'xarray'
Provides-Extra: xlsx
Requires-Dist: openpyxl>=3.1; extra == 'xlsx'
Requires-Dist: xlsxwriter>=3.2; extra == 'xlsx'
Provides-Extra: zarr
Requires-Dist: zarr>=2.16; extra == 'zarr'
Provides-Extra: zendesk
Requires-Dist: zenpy>=2.0; extra == 'zendesk'
Provides-Extra: zstd
Requires-Dist: zstandard>=0.22; extra == 'zstd'
Description-Content-Type: text/markdown

<p align="center">
  <img src="docs/pirn_logo.png" alt="pirn" width="120">
</p>

<h1 align="center">pirn</h1>
<p align="center">A pipeline framework where everything is a knot.</p>

`pirn` builds typed, async, observable data and computation pipelines. You wire
work into a *tapestry* of *knots*, run it, and get back a structured result —
including content-addressed lineage records you can join across runs.

```bash
pip install pirn  # not yet on PyPI; this repo is the source
```

Requires Python 3.11+.

## Quickstart

```python
import asyncio
from pirn import Tapestry, Parameter, KnotConfig, knot, RunRequest

@knot
async def double(x: int) -> int:
    return x * 2

@knot
async def add(a: int, b: int) -> int:
    return a + b

async def main():
    with Tapestry() as t:
        x = Parameter("x", int)
        d = double(x=x, _config=KnotConfig(id="d"))
        answer = add(a=x, b=d, _config=KnotConfig(id="answer"))

    result = await t.run(RunRequest(parameters={"x": 5}))
    print(result.outputs)  # {'param:x': 5, 'd': 10, 'answer': 15}

asyncio.run(main())
```

That's the whole shape: declare knots inside a `Tapestry()` context, wire them
by passing one knot as a kwarg of another, run.

## The constructor convention

When you construct a knot, pirn looks at every kwarg:

* If the value **is itself a knot**, it becomes a **parent** — this knot
  depends on the other knot's output.
* Otherwise, the value is **config** — a constant fed in at run time.

So `add(a=x, b=d, _config=KnotConfig(id="answer"))` makes `x` and `d` parents
of `answer`. There's no separate `parents={...}` dict to remember.

Framework metadata (the knot's id, error-handling policy, validation toggle)
goes in the reserved `_config=` kwarg, which keeps the framework's namespace
out of yours.

Every knot needs an explicit id — pirn doesn't auto-generate them, because
auto-generated ids make lineage records unreadable.

## Tapestry

A `Tapestry` is the workspace your knots live in. Constructing knots inside
`with Tapestry() as t:` auto-registers them. You can also pass `tapestry=`
explicitly, or hand a knot directly to `t.register(knot)`.

`t.run(request)` walks from the tapestry's terminal knots (those with no
downstream consumers) and runs the whole graph reachable from them. To run a
specific subset, pass `terminals=knot_or_list`.

A tapestry holds three backends:

| Backend         | What it stores                                  | Defaults / Phase 3 options                  |
|-----------------|-------------------------------------------------|---------------------------------------------|
| `TapestryStore` | the canonical knot definitions                  | `InMemoryStore`, `SQLiteStore`, `PostgresStore`, `ValKeyStore` |
| `RunHistory`    | run results and lineage records                 | `InMemoryHistory`, `SQLiteHistory`, `DuckDBHistory`, `PostgresHistory` |
| `DataStore`     | intermediate values, keyed by content hash      | `InMemoryDataStore`, `LocalDiskDataStore`, `ValKeyDataStore`, `S3DataStore` |

They're separate so each can be picked for its strength: Postgres for both
store and history when you want one durable database; SQLite store +
DuckDB history when you want OLAP-fast lineage queries; ValKey for the
data store where content-addressed values fit a key-value store
naturally; S3 when intermediate values are large or shared across many
workers.

Each backend lives behind an extra: `pip install pirn[sqlite]`,
`pirn[postgres]`, `pirn[valkey]`, `pirn[duckdb]`, `pirn[s3]`, or
`pirn[all]` for everything.

## Result is three-way

Every knot produces an `Ok`, an `Err`, or a `Skipped`:

* `Ok(value)` — success.
* `Err(record)` — failure; the record is a Pydantic `ExceptionRecord` with the
  type, message, traceback, and a stable id.
* `Skipped(reason)` — opted out, branch not selected, gate closed, parent
  failed under the default policy. Distinct from `Err` so downstream knots
  can react differently to "didn't run" vs "crashed".

By default, a knot whose parent produced `Err` or `Skipped` is itself
skipped (`SKIP_IF_PARENT_FAILED`). Other policies:

* `RECEIVE_ERRORS` — your `process()` is called with `Result` objects directly,
  so you handle errors yourself.
* `REQUIRE_ALL_PARENTS` — any failed/skipped parent makes this knot fail too.

Set per-knot via `_config=KnotConfig(id="...", error_policy=...)`.

## Optional knots

If you want an `Err` from a particular knot to behave like a `Skipped`
downstream, mix in `Optional`:

```python
from pirn import Optional, Knot

class FetchPrefs(Optional, Knot):
    async def process(self, user_id: str) -> dict:
        ...
```

`Optional` is a mixin, not a flag, so it composes cleanly with subclasses
that have their own behaviour.

## Lineage, content-addressed

Every knot execution produces a `KnotLineage` record:

```python
KnotLineage(
    run_id="run-abc",
    knot_id="answer",
    knot_class="my_pkg.knots.Add",
    knot_config_hash="sha256:…",       # the knot's config at run time
    parent_input_hashes={               # what it consumed
        "a": "sha256:…",
        "b": "sha256:…",
    },
    output_hash="sha256:…",            # what it produced
    outcome="ok",
    dispatcher="LocalDispatcher",
    started_at=…, finished_at=…,
)
```

Because hashes are content-addressed (sha256 of a stable canonicalisation),
the same value always hashes to the same string regardless of which run
produced it. This makes cross-run lineage queries trivial:

```python
# Did anything else in any past run produce this same output?
matches = await tapestry.history.query_lineage_by_output_hash(out_hash)

# Who else consumed this value as input?
consumers = await tapestry.history.query_lineage_by_input_hash(in_hash)

# What's this knot's run history?
records = await tapestry.history.query_lineage_by_knot_id("answer")
```

Lineage records reference values by hash; the `DataStore` holds the values.
You can scrub values from the data store (TTL, GDPR, whatever) without
losing the lineage graph.

## The node taxonomy

Beyond `Knot`, pirn ships a handful of specialised classes:

| Class              | Shape                                                              |
|--------------------|--------------------------------------------------------------------|
| `Source`           | zero parents → produces a value (file, DB query, fetch, …)         |
| `Sink`             | terminal consumer; output conventionally `None`                    |
| `Aggregator`       | N parents combined via a `combine` callable                        |
| `Branch`           | one input + selector → tagged path; non-selected paths are skipped |
| `Gate`             | one input + predicate → pass through or skip                       |
| `Map`              | fan a knot over every element of a parent's list                   |
| `ZipMap`           | fan a knot over multiple collections element-wise                  |
| `DictMap`          | fan a knot over the entries of a dict                              |
| `Reduce`           | folds a list parent into one value (whole-list or pairwise)        |
| `SubTapestry`      | a knot whose execution body is a complete inner tapestry           |
| `WithContinuation` | wraps a knot; spawns successors based on its output at runtime     |
| `LoopSubTapestry`  | iterative SubTapestry; iterations are knots in one extensible run  |

`Optional` is a mixin (not a node).

```python
from pirn import Map, Reduce, Aggregator, Gate, Branch

# Map an inner knot over a collection-producing parent.
users = Map(
    over=record_ids,
    each=enrich_record,
    bind="record_id",
    _config=KnotConfig(id="users"),
)

# Reduce a list to one value.
total = Reduce(of=users, combine=sum, _config=KnotConfig(id="total"))

# Branch on a selector.
b = Branch(
    input=msg,
    selector=lambda m: m["type"],
    branches=("tool", "response"),
    _config=KnotConfig(id="route"),
)
handle_tool(payload=b["tool"], _config=KnotConfig(id="t"))
handle_resp(payload=b["response"], _config=KnotConfig(id="r"))
```

## Dispatchers

The dispatcher decides where work runs.

* `LocalDispatcher` — runs in the current event loop. The default.
* `ThreadDispatcher(max_workers=...)` — offloads each knot to a global
  thread pool, useful for CPU-bound or sync-heavy work.
* `DaskDispatcher` — runs each knot on a Dask cluster
  (`pip install pirn[dask]`).
* `RayDispatcher` — runs each knot as a Ray task
  (`pip install pirn[ray]`).
* `CeleryDispatcher` — submits each knot through Celery
  (`pip install pirn[celery]`).

```python
from pirn import ThreadDispatcher
from pirn.engine.dask_dispatcher import DaskDispatcher

# In-process scaling.
with Tapestry(dispatcher=ThreadDispatcher(max_workers=8)) as t:
    ...

# Distributed scaling.
dispatcher = DaskDispatcher.local()  # or DaskDispatcher(scheduler="tcp://...")
with Tapestry(dispatcher=dispatcher) as t:
    ...
```

All dispatchers honor the same `Dispatcher` protocol; switching between
them doesn't change the rest of your pipeline.

## Triggers and emitters

A **trigger** starts a run when an external event arrives. An
**emitter** observes runs as they happen and fans events out to logs,
metrics, message buses, or traces.

### Triggers

```python
from pirn.triggers import CronTrigger, KafkaTrigger, WebhookTrigger, run_forever

# Run every five minutes.
trigger = CronTrigger(every_seconds=300)
await run_forever(trigger, tapestry)

# Run on each Kafka message.
trigger = KafkaTrigger(topic="orders", bootstrap_servers="kafka:9092")
await run_forever(trigger, tapestry)

# Run on each HTTP POST.  trigger.app is a Starlette ASGI app you mount on
# any ASGI server (uvicorn, hypercorn, FastAPI).
trigger = WebhookTrigger(path="/run")
import uvicorn
uvicorn.run(trigger.app, host="0.0.0.0", port=8080)
```

`ValKeyTrigger` (pubsub) is also available; full list in
`pirn.triggers`.

### Emitters

```python
from pirn import LogEmitter, KafkaEmitter, OpenTelemetryEmitter

# Stream structured logs.
log_emitter = LogEmitter(with_payload=False)

# Publish to Kafka for downstream observability tools.
kafka_emitter = KafkaEmitter(
    bootstrap_servers="kafka:9092",
    topic_status="pirn.status",
    topic_lineage="pirn.lineage",
    topic_result="pirn.result",
)

# OpenTelemetry trace spans per knot.
otel_emitter = OpenTelemetryEmitter()

with Tapestry(emitters=[log_emitter, kafka_emitter, otel_emitter]) as t:
    ...
```

`WebhookEmitter` and `ValKeyEmitter` are also available. A broken
emitter never breaks a run — exceptions inside emitters are isolated.

## Streaming sources

Triggers fire whole runs (request/response). **Streaming sources**
feed continuous data into a single long-running pipeline — ETL-style.

```python
from pirn.streaming import IterableSource, FileTailSource, run_stream

# Tail a log file forever.
source = FileTailSource("/var/log/app.log", parameter_name="line")
await run_stream(source, tapestry, on_result=handle)

# Wrap any iterable.
source = IterableSource([1, 2, 3], parameter_name="x")
await run_stream(source, tapestry)
```

`KafkaStreamingSource` is available too. If you want to drive
trigger-based machinery from a stream, wrap with
`StreamingSourceTrigger`.

## Mid-run extension and dynamic DAGs

For dynamic pipelines where a knot decides what comes next based on its
own output, opt into **extensible** runs:

```python
result = await tapestry.run(extensible=True)
```

Inside any knot's `process()`, call `get_current_store()` to register
successor knots into the running tapestry. The engine picks them up
between waves:

```python
from pirn.tapestry import get_current_store

class PlannerKnot(Knot):
    async def process(self, ctx: Context, **_) -> Context:
        store = get_current_store()
        if store is not None:
            for action in plan_actions(ctx):
                store.register(ActionKnot(ctx=self, action=action,
                                          _config=KnotConfig(id=action.id)))
        return ctx
```

Successor knots wired to `self` receive the planner's output as a real
data edge — the lineage reflects the true parent/child relationship, not
a shared state blob.

For continuation-style logic (deterministic next-steps attached to an
existing knot without modifying it), use `continues()`:

```python
from pirn.nodes.continuation import Next, continues

def router(result) -> list[Next]:
    if result.score > 0.8:
        return [Next("publish", {"data": result.content})]
    return [Next("review", {"data": result.content})]

continues(score_knot, fn=router, pool={"publish": PublishKnot, "review": ReviewKnot})
```

Requires `InMemoryStore` (the default). `SQLiteStore` and other
persistent stores do not yet support mid-run extension.

## Visualization

```python
from pirn import mermaid_for_tapestry, mermaid_for_run, html_for_run

# Mermaid for embedding in docs.
print(mermaid_for_tapestry(t))           # structure only
print(mermaid_for_run(result))           # structure + outcome colors

# Standalone HTML/SVG for browsing.
Path("run.html").write_text(html_for_run(result))
```

The HTML renderer produces a single self-contained file with hover
tooltips (knot id, class, outcome, hashes, duration), filter buttons
by outcome, and a longest-path layout — no server, no external assets.

## YAML pipelines

Pipelines can be declared in YAML and loaded with `load_pipeline`.

```yaml
name: simple
nodes:
  - id: x
    type: parameter
    type_: int
    has_default: true
    default: 5
  - id: doubled
    type: knot
    callable: double
    parents:
      x: x
  - id: answer
    type: knot
    callable: add
    parents:
      a: x
      b: doubled
```

```python
from pirn import load_pipeline, RunRequest

t = load_pipeline(
    yaml_text,
    known_callables={"double": double, "add": add},
)
result = await t.run(RunRequest())
```

Strict by default: every callable, predicate, selector, combine, or `each`
reference must be in `known_callables`. Set `allow_callable_refs: true` at
the top level to opt into dotted-path imports (loose mode).

## Security

pirn uses **pickle** to serialize intermediate values in the `S3DataStore`, `ValKeyDataStore`, and `LocalDiskDataStore` backends. Pickle is an arbitrary-code-execution primitive: only use these backends when the backing store is not writable by adversaries.

The `WebhookTrigger`'s built-in authentication is opt-in via the `auth_token=` constructor parameter. For defence-in-depth, also place an authenticating reverse proxy or middleware in front of it before exposing it to any network. See [docs/webhook-trigger-auth.md](docs/webhook-trigger-auth.md) for details.

Setting `allow_callable_refs: true` in a YAML pipeline **enables dynamic Python imports** from YAML content. Only use this with YAML authored by trusted developers — never with user-supplied YAML.

To report a vulnerability, see [SECURITY.md](SECURITY.md).

## Documentation

| Document | Contents |
|----------|----------|
| [docs/architecture.md](docs/architecture.md) | Full architecture and design reference: execution model, backend matrix, extension points, Mermaid diagrams |
| [docs/choosing-backends.md](docs/choosing-backends.md) | When to use each storage backend |
| [docs/deployment-sizing.md](docs/deployment-sizing.md) | Sizing guidance for different deployment scales |
| [docs/observability.md](docs/observability.md) | Emitters, OTel, Kafka, log structure |
| [docs/schema-migrations.md](docs/schema-migrations.md) | Database schema migration procedures |
| [docs/subscribable-stores.md](docs/subscribable-stores.md) | Mid-run extension and subscribable store protocol |
| [SECURITY.md](SECURITY.md) | Responsible disclosure policy |

## Domain libraries

pirn ships domain-specific knot libraries for common data engineering and ML workloads. All domain libraries live under `pirn/domains/` in the same package — dependencies are isolated via optional extras so you install only what your project uses.

| Domain | Description | Extra |
|--------|-------------|-------|
| Data | Tiered data-frame knots (pandas, Polars, Ibis, Spark, DuckDB), lakehouse adapters, tabular transforms | `pirn[data]` |
| Agents | LLM-backed knots, tool use, memory stores, planning, RAG, ReAct, multi-agent patterns | `pirn[agents]` |
| ML | Data prep, feature engineering, training, evaluation, deployment, feature stores | `pirn[ml]` |
| Health | DICOM, FHIR, HL7v2, EDF/BDF, NIfTI, FASTA/FASTQ, VCF — medical imaging, genomics, clinical data | `pirn[health]` |
| Signal | Time-series, DSP, audio (WAV/FLAC/MP3), EEG/BDF, wavelet transforms | `pirn[signal]` |
| Oil & Gas | SEG-Y seismic, LAS well-log, WITSML — subsurface data connectors | `pirn[oilgas]` |

### File format coverage

pirn ships approximately 98 file formats across 16 categories: universal tabular (CSV, Parquet, ORC, Avro, Feather), office documents (XLSX, ODS, DOCX, PPTX, PDF, RTF), scientific (HDF5, NetCDF, Zarr, MATLAB), image (PNG, JPEG, TIFF, WebP, HEIC), geospatial (GeoJSON, Shapefile, KML, GeoTIFF, GeoPackage), ML artifacts (ONNX, SafeTensors, Joblib, PyTorch, TF SavedModel, GGUF, TFLite), compression codecs (gzip, bzip2, zstd, snappy, lz4), archive formats (tar, zip), lakehouse table formats (Delta Lake, Apache Iceberg, Apache Hudi), healthcare (DICOM, FHIR, HL7v2, EDF/BDF, CDA, NIfTI), genomics (FASTA, FASTQ, VCF, BCF), markup (HTML, Markdown, ePub), and more. See [docs/connectors/index.md](docs/connectors/index.md) for the full format matrix.

### PHI safety

Healthcare formats (DICOM, FHIR, HL7v2, EDF/BDF, CDA) include built-in PHI redaction support. Sensitive fields — patient names, dates of birth, MRNs, and other HIPAA-defined identifiers — can be scrubbed or pseudonymised before records flow into downstream knots. Redaction is opt-in per format instance and is audited through pirn's standard content-addressed lineage so every scrub event is traceable.

### ML deserialization security

`JoblibFormat` and `PytorchFormat` use pickle internally. Both constructors refuse to proceed without either a `_Signer` instance (HMAC-SHA256 signs payloads before emission and verifies before deserialisation) or an explicit `allow_unsigned=True` acknowledgement (intended for single-tenant dev/test environments only). `SafetensorsFormat` is RCE-safe by design and requires no signer. See [docs/domains/ml.md](docs/domains/ml.md) for the full security property table.

## Philosophy

* **Declarative wiring, imperative bodies.** Wiring happens in `Tapestry`
  context blocks; bodies are normal Python `async` functions.
* **Three-way results from the start.** Skip is not failure; both deserve
  first-class handling.
* **Lineage by default, not as an add-on.** Every run produces structured,
  content-addressed records that join across runs.
* **Backends are protocols.** SQLite, Postgres, DuckDB, ValKey, S3, local
  disk — pick the shape that fits your deployment without API churn.
* **Optional deps stay optional.** Each backend, dispatcher, trigger, and
  emitter is gated behind a `[bracket]` extra; install only what you use.

## Status

Phase 3 (current). Public API stable: every protocol from Phase 2 still
works, and Phase 3 adds the networked backends, distributed dispatchers,
event-driven triggers and emitters, streaming sources, mid-run
extension, and visualization on top.

For testing real backends (Postgres, ValKey, Kafka, S3) end-to-end, see
[docs/guides/testing.md](docs/guides/testing.md).

Apache-2.0.
