Metadata-Version: 2.4
Name: cogflow
Version: 2.0.1b1
Summary: COGFlow — modular machine learning workflow management system
Author-email: Sai Kireeti <sai.kireeti@hiro-microdatacenters.nl>
License-Expression: MIT
Project-URL: Homepage, https://github.com/HIRO-MicroDataCenters-BV/cogflow
Project-URL: Issues, https://github.com/HIRO-MicroDataCenters-BV/cogflow/issues
Project-URL: Documentation, https://hiro-microdatacenters-bv.github.io/cogflow
Keywords: ml,workflow,kfp,federated-learning,cogflow,mlops
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Developers
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE.md
Requires-Dist: mlflow==2.22.0
Requires-Dist: kfp==1.8.22
Requires-Dist: pyyaml<7,>=6.0.2
Requires-Dist: pydantic<2.0.0
Requires-Dist: scikit-learn==1.7.0
Requires-Dist: tenacity
Requires-Dist: kubernetes
Requires-Dist: kubernetes_asyncio
Requires-Dist: minio>=7.2.0
Provides-Extra: dev
Requires-Dist: pytest>=7.4.0; extra == "dev"
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
Requires-Dist: pytest-mock>=3.12.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: coverage; extra == "dev"
Requires-Dist: build; extra == "dev"
Requires-Dist: twine; extra == "dev"
Provides-Extra: docs
Requires-Dist: mkdocs>=1.6.0; extra == "docs"
Requires-Dist: mkdocs-material>=9.4.0; extra == "docs"
Requires-Dist: pdoc3>=0.11.0; extra == "docs"
Dynamic: license-file

# CogFlow

**CogFlow** is a modular, SDK-first machine learning workflow management system
built on **Kubeflow Pipelines**, **MLflow**, **Kubernetes**, and **MinIO**.

It provides a clean Python API for:
- building production-grade ML pipelines
- managing datasets and components
- orchestrating federated learning workflows
- enforcing consistent error handling and validation

CogFlow is designed for **real infrastructure**, not notebooks only.

---

## Why CogFlow?

Modern ML platforms are powerful but fragmented:

- Kubeflow Pipelines → great orchestration, weak ergonomics
- MLflow → experiment tracking, limited workflow control
- Kubernetes → powerful, but verbose and error-prone
- Federated learning → no standard orchestration layer

**CogFlow bridges these gaps** by providing:

- a stable Python SDK
- safe lazy-loading of heavy dependencies
- unified error handling
- infrastructure-aware abstractions
- zero circular imports

---

## Core Features

### 🧩 Pipeline Orchestration
- Lazy-loaded Kubeflow Pipelines client
- Safe pipeline compilation and submission
- Runtime environment injection
- Kubernetes service lifecycle management

### 📦 Component Management
- YAML-based component registry
- MinIO-backed component storage
- Automatic component registration
- Runtime-safe component loading

### 📊 Dataset Management
- Dataset registration and metadata retrieval
- Secure dataset downloads
- Silent deletes with strict error semantics
- Pluggable storage backends

### 🤝 Federated Learning
- Auto-generated FL pipelines
- Dynamic pipeline signatures
- Connector-based and dataspace-based workflows
- Region-aware scheduling with node selectors

### 🧠 Unified Error Handling
- Strongly typed error hierarchy
- Context-aware exception wrapping
- API-ready error serialization
- Zero silent failures

---

## Architecture Overview

CogFlow follows a **layered SDK architecture**:

cogflow/
├── core/
│ ├── pipelines/ # Kubeflow orchestration & FL pipelines
│ ├── datasets/ # Dataset lifecycle management
│ ├── components/ # Component registry & YAML handling
│ └── models/ # (future extension)
│
├── utils/
│ ├── common.py # UUIDs, paths, Kubernetes helpers
│ ├── network.py # HTTP utilities with retry & streaming
│ ├── storage.py # MinIO client abstraction
│ └── exceptions.py # Unified error framework
│
├── config.py
└── api.py
