Metadata-Version: 2.2
Name: sematic
Version: 0.41.0
Summary: Sematic ML orchestration tool
Requires-Python: <3.14,>=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: ipython==8.2.0
Requires-Dist: setuptools==58.1.0
Requires-Dist: SQLAlchemy>=2.0
Requires-Dist: psycopg2-binary>=2.9.5
Requires-Dist: pyyaml>=6.0.1
Requires-Dist: python-magic>=0.4.27
Requires-Dist: git-python>=1.0.3
Requires-Dist: docker>=6.0.0
Requires-Dist: websocket-client>=1.5.1
Requires-Dist: python-socketio>=5.7.2
Requires-Dist: flask>=2.2.2
Requires-Dist: flask-cors>=3.0.10
Requires-Dist: cloudpickle>=2.2.1
Requires-Dist: requests>=2.28.2
Requires-Dist: werkzeug>=2.2.3
Requires-Dist: python-dateutil>=2.8.2
Requires-Dist: starlette>=0.25.0
Requires-Dist: google-auth>=2.16.0
Requires-Dist: uvicorn[standard]>=0.20.0
Requires-Dist: asgiref>=3.7.2
Requires-Dist: click>=8.1.3
Requires-Dist: kubernetes>=25.3.0
Requires-Dist: boto3>=1.26.82
Requires-Dist: google-cloud-storage>=2.10.0
Requires-Dist: types-google-cloud-ndb>=2.2.0.0
Provides-Extra: examples
Requires-Dist: snowflake-connector-python==3.12.4; python_version < "3.13" and extra == "examples"
Requires-Dist: pyOpenSSL>=23.0.0; extra == "examples"
Requires-Dist: pyarrow>=12.0.0; python_version < "3.13" and extra == "examples"
Requires-Dist: python-magic==0.4.27; extra == "examples"
Requires-Dist: torch>=1.13.1; python_version < "3.13" and extra == "examples"
Requires-Dist: torchvision>=0.14.1; python_version < "3.13" and extra == "examples"
Requires-Dist: pytorch-lightning>=1.6.5; python_version < "3.13" and extra == "examples"
Requires-Dist: ray-lightning>=0.3.0; python_version < "3.13" and extra == "examples"
Requires-Dist: plotly==5.13.0; python_version < "3.13" and extra == "examples"
Requires-Dist: pandas>=1.5.3; python_version < "3.13" and extra == "examples"
Requires-Dist: seaborn>=0.12.2; python_version < "3.13" and extra == "examples"
Requires-Dist: matplotlib>=3.7.0; python_version < "3.13" and extra == "examples"
Requires-Dist: statsmodels>=0.13.5; python_version < "3.13" and extra == "examples"
Requires-Dist: scikit-learn>=1.2.1; python_version < "3.13" and extra == "examples"
Requires-Dist: numpy>=1.24.0; python_version < "3.13" and extra == "examples"
Requires-Dist: xgboost>=1.7.3; python_version < "3.13" and extra == "examples"
Requires-Dist: accelerate==0.19.0; python_version < "3.13" and extra == "examples"
Requires-Dist: datasets>=2.12.0; python_version < "3.13" and extra == "examples"
Requires-Dist: huggingface-hub>=0.14.1; extra == "examples"
Requires-Dist: peft>=0.3.0; python_version < "3.13" and extra == "examples"
Requires-Dist: transformers>=4.29.2; python_version < "3.13" and extra == "examples"
Requires-Dist: gradio>=3.35.2; python_version < "3.13" and extra == "examples"
Requires-Dist: trafilatura>=1.6.0; extra == "examples"
Requires-Dist: cohere>=4.9.0; extra == "examples"
Requires-Dist: openai>=0.27.8; extra == "examples"
Provides-Extra: ray
Requires-Dist: ray[air,default]>=2.3.0; python_version < "3.13" and extra == "ray"
Provides-Extra: all
Requires-Dist: ray[air,default]>=2.3.0; python_version < "3.13" and extra == "all"

![Sematic Logo](https://raw.githubusercontent.com/sematic-ai/sematic/main/docs/images/Logo_README.png)



![PyPI](https://img.shields.io/pypi/v/sematic/0.41.0?style=for-the-badge)
[![CircleCI](https://img.shields.io/circleci/build/github/sematic-ai/sematic/main?label=CircleCI&style=for-the-badge&token=60d1953bfee5b6bf8201f8e84a10eaa5bf5622fe)](https://app.circleci.com/pipelines/github/sematic-ai/sematic?branch=main&filter=all)
![PyPI - License](https://img.shields.io/pypi/l/sematic?style=for-the-badge)
[![Python 3.9](https://img.shields.io/badge/Python-3.9-blue?style=for-the-badge&logo=none)](https://python.org)
[![Python 3.10](https://img.shields.io/badge/Python-3.10-blue?style=for-the-badge&logo=none)](https://python.org)
[![Python 3.11](https://img.shields.io/badge/Python-3.11-blue?style=for-the-badge&logo=none)](https://python.org)
[![Python 3.12](https://img.shields.io/badge/Python-3.12-blue?style=for-the-badge&logo=none)](https://python.org)
[![Python 3.13](https://img.shields.io/badge/Python-3.13-blue?style=for-the-badge&logo=none)](https://python.org)
![Discord](https://img.shields.io/discord/983789877927747714?label=DISCORD&style=for-the-badge)
[![Made By Sematic](https://img.shields.io/badge/Made_by-Sematic_🦊-E19632?style=for-the-badge&logo=none)](https://sematic.dev)
![PyPI - Downloads](https://img.shields.io/pypi/dm/sematic?style=for-the-badge)

![Sematic Screenshot](https://raw.githubusercontent.com/sematic-ai/sematic/main/docs/images/Screenshot_README_2.png)

[Sematic](https://sematic.dev) is an open-source ML development platform. It
lets ML Engineers and Data Scientists write arbitrarily complex end-to-end
pipelines with simple Python and execute them on their local machine, in a cloud
VM, or on a Kubernetes cluster to leverage cloud resources.

Sematic is based on learnings gathered at top self-driving car companies. It
enables chaining data processing jobs (e.g. Apache Spark) with model training
(e.g. PyTorch, Tensorflow), or any other arbitrary Python business logic into
type-safe, traceable, reproducible end-to-end pipelines that can be monitored
and visualized in a modern web dashboard.

Read our [documentation](https://docs.sematic.dev) and join our [Discord
channel](https://discord.gg/4KZJ6kYVax).

## Why Sematic

- **Easy onboarding** – no deployment or infrastructure needed to get started,
  simply install Sematic locally and start exploring.
- **Local-to-cloud parity** – run the same code on your local laptop and on your
  Kubernetes cluster.
- **End-to-end traceability** – all pipeline artifacts are persisted, tracked,
  and visualizable in a web dashboard.
- **Access heterogeneous compute** – customize required resources for each
  pipeline step to optimize your performance and cloud footprint (CPUs, memory,
  GPUs, Spark cluster, etc.)
- **Reproducibility** – rerun your pipelines from the UI with guaranteed
  reproducibility of results

## Getting Started

To get started locally, simply install Sematic in your Python environment:

```shell
$ pip install sematic
```

Start the local web dashboard:

```shell
$ sematic start
```

Run an example pipeline:

```shell
$ sematic run examples/mnist/pytorch
```

Create a new boilerplate project:

```shell
$ sematic new my_new_project
```

Or from an existing example:

```shell
$ sematic new my_new_project --from examples/mnist/pytorch
```

Then run it with:

```shell
$ python3 -m my_new_project
```

To deploy Sematic to Kubernetes and leverage cloud resources, see our
[documentation](https://docs.sematic.dev).

## Features

- **Lightweight Python SDK** – define arbitrarily complex end-to-end pipelines
- **Pipeline nesting** – arbitrarily nest pipelines into larger pipelines
- **Dynamic graphs** – Python-defined graphs allow for iterations, conditional
  branching, etc.
- **Lineage tracking** – all inputs and outputs of all steps are persisted and
  tracked
- **Runtime type-checking** – fail early with run-time type checking
- **Web dashboard** – Monitor, track, and visualize pipelines in a modern web UI
- **Artifact visualization** – visualize all inputs and outputs of all steps in
  the web dashboard
- **Local execution** – run pipelines on your local machine without any
  deployment necessary
- **Cloud orchestration** – run pipelines on Kubernetes to access GPUs and other
  cloud resources
- **Heterogeneous compute resources** – run different steps on different
  machines (e.g. CPUs, memory, GPU, Spark, etc.)
- **Helm chart deployment** – install Sematic on your Kubernetes cluster
- **Pipeline reruns** – rerun pipelines from the UI from an arbitrary point in
  the graph
- **Step caching** – cache expensive pipeline steps for faster iteration
- **Step retry** – recover from transient failures with step retries
- **Metadata and collaboration** – Tags, source code visualization, docstrings,
  notes, etc.
- **Numerous integrations** – See below

## Integrations

- **Apache Spark** – on-demand in-cluster Spark cluster
- **Ray** – on-demand Ray in-cluster Ray resources
- **Snowflake** – easily query your data warehouse (other warehouses supported
  too)
- **Plotly, Matplotlib** – visualize plot artifacts in the web dashboard
- **Pandas** – visualize dataframe artifacts in the dashboard
- **Grafana** – embed Grafana panels in the web dashboard
- **Bazel** – integrate with your Bazel build system
- **Helm chart** – deploy to Kubernetes with our Helm chart
- **Git** – track git information in the web dashboard

## Community and resources

Learn more about Sematic and get in touch with the following resources:

- [Sematic landing page](https://sematic.dev)
- [Documentation](https://docs.sematic.dev)
- [Discord channel](https://discord.gg/4KZJ6kYVax)
- [YouTube channel](https://www.youtube.com/@sematic-ai)
- [Our Blog](https://sematic.dev/blog)

## Contribute!

To contribute to Sematic, check out [open issues tagged "good first
issue"](https://github.com/sematic-ai/sematic/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22),
and get in touch with us on [Discord](https://discord.gg/4KZJ6kYVax).
You can find instructions on how to get your development environment set up
in our [developer docs](./developer-docs/README.md). If you'd like to add
an example, you may also find
[this guide](https://docs.sematic.dev/project/contributor-guide/contribute-example)
helpful.

![scarf pixel](https://static.scarf.sh/a.png?x-pxid=80c3593f-25a0-4b06-90a1-0b670a6567d4)
