Metadata-Version: 2.3
Name: MedDataKit
Version: 0.0.8
Summary: Medical Public Data Kit
Author: Sitao Min
Author-email: sitaomin1994@gmail.com
Requires-Python: >=3.10,<3.13
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Dist: kaggle (>=1.6.17,<2.0.0)
Requires-Dist: liac-arff (>=2.5.0,<3.0.0)
Requires-Dist: lifelines (>=0.30.0,<0.31.0)
Requires-Dist: matplotlib (>=3.9.2,<4.0.0)
Requires-Dist: openpyxl (>=3.1.5,<4.0.0)
Requires-Dist: pandas (>=2.2.2,<3.0.0)
Requires-Dist: patool (>=2.4.0,<3.0.0)
Requires-Dist: pyreadr (>=0.5.2,<0.6.0)
Requires-Dist: python-dotenv (>=1.0.1,<2.0.0)
Requires-Dist: pyunpack (>=0.3,<0.4)
Requires-Dist: rdata (>=0.11.2,<0.12.0)
Requires-Dist: scikit-learn (>=1.5.2,<2.0.0)
Description-Content-Type: text/markdown

# MediDataKit

## Design

1. Downloader: download data from specific sources e.g. UCIML, Local Directory
    - Functionalities:
        - download data files with provided URLs, Paths, and other information
        - save downloaded files, check if the file exists before downloading
    - Information produced:
        - raw_data_files: original downloaded data files (file name and paths, number of files, file size, etc.)

2. DataLoader: load data into memory
    - Functionalities:
        - load downloaded data into memory


raw_data_files (original downloaded data files) -> raw_data (centralized data, federated data) -> ML-ready data (data ready for performing ML tasks)

