Metadata-Version: 2.1
Name: spark-pipeline
Version: 0.0.4
Summary: Data Science oriented tools, mostly for Apache Spark
Home-page: https://github.com/dllllb/spark-pipeline
Author: Dmitri Babaev
Author-email: dmitri.babaev@gmail.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: pyspark (>=2)
Requires-Dist: pandas (>=0.23)

Data Science oriented tools, mostly for Apache Spark

- The pipepline for using Python ML models together with Apache Spark
- Command-line tools (see [readme](bin/README.md))
- *demo*: usage demos in form of Jupyter notebooks
  - model inference on cluster: [demo/score-sklearn.ipynb](demo/score-sklearn.ipynb)
  - quick dataset distribution change detection: [demo/datadiff.ipynb](demo/datadiff.ipynb)


