Metadata-Version: 2.4
Name: jet-k8s
Version: 0.1.7
Summary: A CLI Job Execution Toolkit (Jet) for Kubernetes
Author-email: Manideep Kolla <manideep2510@gmail.com>
License: Apache-2.0
Project-URL: Homepage, https://github.com/manideep2510/jet-k8s
Project-URL: Issues, https://github.com/manideep2510/jet-k8s/issues
Keywords: kubernetes,k8s,cli,job execution,job management
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.8.1
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: PyYAML>=6.0
Requires-Dist: kr8s>=0.19.0
Requires-Dist: httpx>=0.23.0
Requires-Dist: textual>=2.0.0
Requires-Dist: tabulate>=0.9.0
Dynamic: license-file

# Jet: A CLI Job Execution Toolkit (Jet) for Kubernetes

Skip the YAML. A lightweight command-line Job Execution Toolkit (Jet) for Kubernetes that simplifies batch job management with a focus on ML workloads.

[![PyPI version](https://badge.fury.io/py/jet-k8s.svg)](https://badge.fury.io/py/jet-k8s) [![License: Apache-2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)

## Features

- 🚀 **Simplified Job Submission** - Define and submit Kubernetes jobs directly from the command line without writing YAML files manually.
- 📊 **Easy Monitoring** - Track and manage batch jobs with a fast and responsive Terminal User Interface (TUI).
- 📈 **Resource Management** - View available cluster resources (CPU, memory, GPU) with `kube-state-metrics` integration.
- 📄 **Work with Templates** - Save custom job templates to standardize and simplify job configurations, making your experiments reproducible.
- 📓 **Jupyter Integration** - Launch Jupyter notebooks on Kubernetes with automatic port forwarding.
- 🐛 **Debug Sessions** - Spin up interactive debug pods for quick troubleshooting.
- 🛠️ **Creating Services** - Easily create and manage Kubernetes services to expose your applications.
- 🤖 **ML Focused** - Designed with Python machine learning workloads and data processing tasks in mind.

## Overview

Jet eliminates the complexity of Kubernetes YAML configuration files, providing a streamlined CLI experience for:
- Defining and submitting batch jobs
- Monitoring job status and logs with a lightweight and fast Terminal User Interface (TUI) inspired by [`k9s`](https://k9scli.io/).
- Viewing cluster resource availability (CPU, memory, GPU) with integration to `kube-state-metrics`.
- Running interactive Jupyter notebook sessions on Kubernetes with automatic port forwarding.
- Creating interactive shell debug environments for troubleshooting and debugging.
- Automatic job cleanup for Jupyter and debug sessions.
- Easily create and manage Kubernetes services to expose your applications.
Perfect for ML engineers and researchers who want to leverage Kubernetes for ML training, inference and experimentation jobs without the YAML overhead.

## Installation

### Dependencies

1. Python 3.8.1 or higher.

2. `kubectl` installed and configured on your local machine. Refer to the [official Kubernetes documentation](https://kubernetes.io/docs/tasks/tools/) for installation instructions.

3. A running Kubernetes cluster, with kubeconfig properly set up to access the cluster from your local machine.

4. For the `jet resources` command, `kube-state-metrics` must be installed in your cluster to fetch resource metrics. Currently tested with `kube-state-metrics` versions 7.0.0 and 7.1.0.
 
### Install Jet
Jet can be installed using pip from PyPI:

```bash
pip install jet-k8s
```

Installing as a `uv` tool:

```bash
uv tool install jet-k8s
```

Verify the installation:

```bash
jet --version

jet --help
```

## Usage
After installation, you can use the `jet` command in your terminal. Here are some basic commands:

Please refer to the following sections for detailed user guides.

- [Submitting Jobs](https://github.com/manideep2510/jet-k8s/blob/main/docs/submitting-jobs.md)
- [Starting Jupyter Notebook Sessions](https://github.com/manideep2510/jet-k8s/blob/main/docs/jupyter-notebooks.md)
- [Starting Debug Sessions](https://github.com/manideep2510/jet-k8s/blob/main/docs/debug-sessions.md)
- [Creating Services](https://github.com/manideep2510/jet-k8s/blob/main/docs/services.md)
- [Using Job Templates](https://github.com/manideep2510/jet-k8s/blob/main/docs/templates.md)
- [Monitoring Jobs](https://github.com/manideep2510/jet-k8s/blob/main/docs/monitoring-jobs.md)
- [Other Commands](https://github.com/manideep2510/jet-k8s/blob/main/docs/other-commands.md)

## Demos 

> [!TIP]
> Click the GIFs to view the demos on Asciinema player with media controls.

### Submitting Jobs
[![til](https://github.com/manideep2510/jet-k8s/raw/main/assets/job-launch.gif)](https://asciinema.org/a/WxvEBtyK4D02BvaLI9CC5a4G4)

### Monitoring Jobs with TUI
[![tui](https://github.com/manideep2510/jet-k8s/raw/main/assets/tui.gif)](https://asciinema.org/a/b1OGpHXfY2RZXieT1EnbVPBgu)

### Starting Jupyter Notebook Sessions and Auto Port-forwarding
[![jupyter](https://github.com/manideep2510/jet-k8s/raw/main/assets/jupyter-launch.gif)](https://asciinema.org/a/y5BqZj8wB79aQTBUnHibyeAY7)

### Starting Interactive Debug Sessions
[![debug](https://github.com/manideep2510/jet-k8s/raw/main/assets/debug-launch.gif)](https://asciinema.org/a/O3V6kAflQpTaczlKsO7kVAGK2)

### Saving and Using Job Templates
[![templates](https://github.com/manideep2510/jet-k8s/raw/main/assets/job-templates.gif)](https://asciinema.org/a/ECWaEYWkT2fJfQHy4zXkow1tP)

## Why Jobs?

Some key reasons for using Kubernetes Jobs for ML workloads:

1. **Batch Workloads**: Jobs are designed for batch processing tasks, which aligns well with ML training and data processing workloads that are typically non-interactive and run to completion.
2. **Automatic Retry**: Jobs have built-in retry mechanisms for failed tasks, which is beneficial for long-running ML jobs that may encounter transient failures.
3. **Resource Management**: Jobs can be scheduled and managed more effectively with schedulers such as KAI-scheduler. For example, pods within jobs can be prempted and automatically rescheduled on different nodes if a high priority job needs resources or to organize pods to optimize cluster resource utilization.
4. **Completion Tracking**: Jobs provide a clear way to track the completion status of tasks, making it easier to manage and monitor ML workloads.

## Notes

1. Jet currently supports some of the frequently used job and pod spec configurations through command-line arguments. If you have specific requirements not currently supported, please raise an issue or contribute a PR to add the necessary features.

2. Jet currently supports only Kubernetes clusters with NVIDIA GPU nodes.
   
3. Pod's `restartPolicy` is set to `Never` for all jobs types by default and job's themselves have `backoffLimit` set to None (so defaults to Kubernetes defaults of 6). This configuration is to ensure that when the containers in pods fail, they are not restarted indefinitely on the same resources, but instead rescheduled on different resources by the job controller. You can override this using the `--restart-policy` argument.

4. The argument `--gpu-type` is implemented using node selectors. Ensure that your cluster nodes are labeled appropriately for the GPU types you intend to use.
For example, to label a node with an A100 GPU, you can use:
   ```bash
   kubectl label nodes <node-name> gpu-type=a100
   ```

5. The pod security context is set to run containers with the same user and group ID as the user executing the `jet` command. This is to ensure proper file permission handling when mounting host directories or volumes. If your use case requires running containers with different user/group IDs, please raise an issue or contribute a PR to make this configurable.

6. The `--pyenv` argument mounts a Python virtual environment from the host into the container at the same path and updates the container's `PATH` accordingly.

   - **Requirements:**

      - **Shared storage**: The env directory must be accessible at the same path on the node where the pod runs. This works automatically with single-node clusters or shared storage (NFS, BeeGFS), but may not work on multi-node clusters without shared storage.

      - **Python compatibility**: The env's Python executable (read from `pyvenv.cfg`) must be available inside the container. This works if:
         - The container image has Python installed at the same path (e.g., `/usr/bin/python3.x` for system Python envs), or
         - The env includes its own Python rather than system Python (e.g., envs created with `uv` or `conda` using a specific Python version).

## TODOs:

- [ ] Add support for fractional GPUs using HAMi plugin (In dev: [KAI-scheduler #60](https://github.com/NVIDIA/KAI-Scheduler/pull/60)).
- [ ] Add support for other accelerator types such as AMDs and TPUs.
- [ ] Evaluate support for other kubernetes schedulers such as Volcano.
- [ ] Ability to submit jobs with parallism and gang scheduling for usecases such as multi-node training jobs.
- [ ] Add support for job dependencies and chaining.
- [ ] Add TUI support for port forwarding.
- [ ] Add TUI support to change namespaces and contexts.
