Metadata-Version: 2.4
Name: disdat
Version: 1.1.5
Summary: Disdat: data versioning
Author-email: Ken Yocum <kyocum@gmail.com>
License: Apache License, version 2.0
Project-URL: Homepage, https://github.com/kyocum/disdat
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Operating System :: OS Independent
Classifier: Natural Language :: English
Requires-Python: <3.15,>=3.10
Description-Content-Type: text/x-rst
License-File: LICENSE.txt
License-File: NOTICE.txt
Requires-Dist: boto3<2.0,>=1.34
Requires-Dist: termcolor<3.0,>=2.0
Requires-Dist: pandas<3.0,>=2.0
Requires-Dist: numpy<3.0,>=1.24
Requires-Dist: sqlalchemy<3.0,>=2.0
Requires-Dist: protobuf<7.0,>=6.31.1
Requires-Dist: docutils>=0.18
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: ipython; extra == "dev"
Requires-Dist: pylint; extra == "dev"
Requires-Dist: coverage; extra == "dev"
Requires-Dist: tox; extra == "dev"
Requires-Dist: moto; extra == "dev"
Requires-Dist: s3fs>=2024.2.0; extra == "dev"
Requires-Dist: pyarrow; extra == "dev"
Requires-Dist: grpcio-tools; extra == "dev"
Provides-Extra: rel
Requires-Dist: build; extra == "rel"
Requires-Dist: wheel; extra == "rel"
Dynamic: license-file

.. figure:: ./docs/DisdatTitleFig.jpg
   :alt: Disdat Logo
   :align: center

\

.. image:: https://badge.fury.io/py/disdat.svg
    :target: https://badge.fury.io/py/disdat

\
\

Note: Disdat 1.0 no longer contains the instrumented form of Luigi.  Disdat-Luigi now resides `here <https://github.com/kyocum/disdat-luigi>`_.  Want to build versioned pipelines?  ``pip install disdat-luigi``   Want to just use the Disdat API?   ``pip install disdat`` 

Disdat is a Python (3.9+) package for data versioning that allows data scientists to create, share, and track data products.  Disdat organizes data into *bundles*, collections of literal values and files -- bundles are the unit at which data is versioned and shared.   Disdat provides an *API* for creating, finding, and publishing bundles to cloud storage (e.g., AWS S3).

`Disdat-Luigi  <https://github.com/kyocum/disdat-luigi>`_ uses this API to instrument Spotify's Luigi, so you can build pipelines that automatically create bundles, making it easy to share the latest outputs with other users and pipelines.  Instead of lengthy email conversations with multiple file attachments, searching through Slack for the most recent S3 file path, users can instead ``dsdt pull awesome_data`` to get the latest 'awesome_data.'


Disdat's bundle API and pipelines provide:

* **Simplified pipelines** -- Users implement two functions per task: `requires` and `run`.

* **Enhanced re-execution logic** -- Disdat re-runs processing steps when code or data changes.

* **Data versioning/lineage** -- Disdat records code and data versions for each output data set.

* **Share data sets** -- Users may push and pull data to remote contexts hosted in AWS S3.

* **Auto-docking** -- Disdat *dockerizes* pipelines so that they can run locally or execute on the cloud.

Find our latest documentation on `gitbook here <https://disdat.gitbook.io>`_!


Authors
-------

Disdat could not have come to be without the support of `Human Longevity, Inc. <https://www.humanlongevity.com>`_  It
has benefited from numerous discussions, code contributions, and emotional support from Sean Rowan, Ted Wong, Jonathon Lunt, 
Jason Knight, Axel Bernel, and `Intuit, Inc. <https://www.intuit.com>`_.
