Metadata-Version: 2.1
Name: ydata-fabric-sdk
Version: 1.0.3
Summary: YData SDK allows to use the *Data-Centric* tools from the YData ecosystem to accelerate AI development
Author-email: YData <opensource@ydata.ai>
License: MIT License
        
        Copyright (c) 2019 YData, Lda.
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Home, https://github.com/ydataai/ydata-fabric-sdk
Classifier: License :: OSI Approved :: MIT License
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: End Users/Desktop
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Intended Audience :: Healthcare Industry
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Telecommunications Industry
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: Implementation
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: <3.13,>=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: httpx==0.23.3
Requires-Dist: ydata-core>=0.2.0
Requires-Dist: pandas>=1.5.0
Requires-Dist: prettytable==3.6.0
Requires-Dist: pydantic==1.10.9
Requires-Dist: typeguard==2.13.3
Requires-Dist: ydata-datascience
Provides-Extra: dev
Requires-Dist: twine; extra == "dev"
Requires-Dist: build; extra == "dev"
Requires-Dist: pylint==2.17.7; extra == "dev"
Requires-Dist: black==22.12.0; extra == "dev"
Requires-Dist: flake8==6.1.0; extra == "dev"
Requires-Dist: isort==5.12.0; extra == "dev"
Requires-Dist: pre-commit==2.21.0; extra == "dev"
Requires-Dist: pyc-wheel==1.2.7; extra == "dev"
Requires-Dist: mypy==1.4.1; extra == "dev"
Provides-Extra: doc
Requires-Dist: mkdocs<1.7.0,>=1.6.0; extra == "doc"
Requires-Dist: mkdocs-material<10.0.0,>=9.0.12; extra == "doc"
Requires-Dist: mkdocs-table-reader-plugin<=2.2.0; extra == "doc"
Requires-Dist: mike<2.2.0,>=2.1.1; extra == "doc"
Requires-Dist: mkdocstrings[python]<1.0.0,>=0.20.0; extra == "doc"
Requires-Dist: mkdocs-badges; extra == "doc"
Provides-Extra: test
Requires-Dist: pytest==6.2.5; extra == "test"
Requires-Dist: pytest-bdd==4.0.*; extra == "test"
Requires-Dist: pytest-cov==2.11.*; extra == "test"
Requires-Dist: pytest-xdist==2.2.*; extra == "test"
Requires-Dist: pytest-mccabe<3.0.0,>=2.0.0; extra == "test"

# YData Fabric SDK

[![pypi](https://img.shields.io/pypi/v/ydata-fabric-sdk)](https://pypi.org/project/ydata-fabric-sdk)
![Pythonversion](https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10%20%7C%203.11%20%7C%203.12-blue)
[![downloads](https://pepy.tech/badge/ydata-fabric-sdk/month)](https://pepy.tech/project/ydata-fabric-sdk)

---
🚀 YData Fabric SDK 🎉
Fabric's platform capabilities at the distance of a Python command!


*ydata-fabric-sdk* is here! Create a [YData Fabric account](https://ydata.ai/register) so you can start using today!

YData Fabric SDK empowers developers with easy access to state-of-the-art data quality tools and generative AI capabilities. Stay tuned for more updates and new features!

---

<p align="center">
  <a href="https://docs.fabric.ydata.ai/latest/sdk/">Documentation</a>
  |
  <a href="https://ydata.ai">More on YData</a>
</p>


## Overview

The Fabric SDK is an ecosystem of methods that allows users to, through a python interface, adopt a Data-Centric approach towards the AI development. The solution includes a set of integrated components for data ingestion, standardized data quality evaluation and data improvement, such as synthetic data generation, allowing an iterative improvement of the datasets used in high-impact business applications.

Synthetic data can be used as Machine Learning performance enhancer, to augment or mitigate the presence of bias in real data. Furthermore, it can be used as a Privacy Enhancing Technology, to enable data-sharing initiatives or even to fuel testing environments.

Under the Fabric SDK hood, you can find a set of algorithms and metrics based on statistics and deep learning based techniques, that will help you to accelerate your data preparation.

### What you can expect:

Fabric SDK is composed by the following main modules:

- **Datasources**
  - Fabric’s SDK includes several connectors for easy integration with existing data sources. It supports several storage types, like filesystems and RDBMS. Check the list of connectors.
  - Fabric SDK’s Datasources run on top of Dask, which allows it to deal with not only small workloads but also larger volumes of data.

- **Synthesizers**
  - Simplified interface to train a generative model and learn in a data-driven manner the behavior, the patterns and original data distribution. Optimize your model for privacy or utility use-cases.
  - From a trained synthesizer, you can generate synthetic samples as needed and parametrise the number of records needed.

- **Synthetic data quality report** *Coming soon*
  - An extensive synthetic data quality report that measures 3 dimensions: privacy, utility and fidelity of the generated data. The report can be downloaded in PDF format for ease of sharing and compliance purposes or as a JSON to enable the integration in data flows.

- **Profiling** *Coming soon*
  - A set of metrics and algorithms summarizes datasets quality in three main dimensions: warnings, univariate analysis and a multivariate perspective.

### Supported data formats

- **Tabular**
The **RegularSynthesizer** is perfect to synthesize high-dimensional data, that is time-independent with high quality results.
- **Time-Series**
The **TimeSeriesSynthesizer** is perfect to synthesize both regularly and not evenly spaced time-series, from smart-sensors to stock.
