Metadata-Version: 2.4
Name: cm-cluster-on-demand
Version: 11.33.0
Summary: NVIDIA Base Command Manager Cluster on Demand
Author-email: Base Command Manager Cloud Team <sw-bright-cloud-team@nvidia.onmicrosoft.com>
License-Expression: Apache-2.0
Project-URL: Documentation, https://docs.nvidia.com/base-command-manager/
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: System Administrators
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: System :: Clustering
Classifier: Topic :: System :: Installation/Setup
Classifier: Topic :: System :: Monitoring
Classifier: Topic :: System :: Systems Administration
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: attrs>=25.4.0
Requires-Dist: filelock>=2.0.8
Requires-Dist: mako>=0.8.1
Requires-Dist: netaddr>=0.8.0
Requires-Dist: passlib>=1.7.4
Requires-Dist: PrettyTable>=3.4.0
Requires-Dist: python_dateutil>=1.5
Requires-Dist: pytz>=2022.5
Requires-Dist: pyyaml>=6.0
Requires-Dist: requests>=2.31.0
Requires-Dist: rich>=13.7.0
Requires-Dist: tenacity>=8.1.0
Requires-Dist: urllib3>=2.5.0
Provides-Extra: aws
Requires-Dist: boto3>=1.42.46; extra == "aws"
Requires-Dist: xmltodict; extra == "aws"
Provides-Extra: azure
Requires-Dist: aiohttp>=3.8.3; extra == "azure"
Requires-Dist: azure-identity~=1.19.0; extra == "azure"
Requires-Dist: azure-mgmt-authorization~=4.0.0; extra == "azure"
Requires-Dist: azure-mgmt-compute~=33.1.0; extra == "azure"
Requires-Dist: azure-mgmt-marketplaceordering==1.1.0; extra == "azure"
Requires-Dist: azure-mgmt-network~=28.1.0; extra == "azure"
Requires-Dist: azure-mgmt-privatedns~=1.2.0; extra == "azure"
Requires-Dist: azure-mgmt-resource~=23.2.0; extra == "azure"
Requires-Dist: azure-mgmt-storage~=21.2.1; extra == "azure"
Requires-Dist: azure-storage-blob~=12.23.1; extra == "azure"
Provides-Extra: gcp
Requires-Dist: google-api-core>=2.30.0; extra == "gcp"
Requires-Dist: google-cloud-asset>=4.2.0; extra == "gcp"
Requires-Dist: google-cloud-compute>=1.46.0; extra == "gcp"
Requires-Dist: google-cloud-filestore>=1.15.0; extra == "gcp"
Requires-Dist: google-cloud-iam>=2.21.0; extra == "gcp"
Requires-Dist: google-cloud-quotas>=0.6.0; extra == "gcp"
Requires-Dist: google-cloud-resource-manager>=1.17.0; extra == "gcp"
Requires-Dist: google-cloud-storage>=3.10.1; extra == "gcp"
Provides-Extra: oci
Requires-Dist: oci>=2.118.2; extra == "oci"
Dynamic: license-file

Cluster on Demand (COD) is a command-line tool for provisioning and managing
[NVIDIA Base Command Manager](https://docs.nvidia.com/base-command-manager/) (BCM) clusters on
major cloud providers. BCM is a cluster management platform that handles node provisioning,
workload scheduling, and software stack deployment across HPC and AI environments.

COD supports managing clusters in AWS, Azure, GCP, and OCI.

# Installation

Install COD with the extra for your cloud provider:

```
pip install cm-cluster-on-demand[aws]
pip install cm-cluster-on-demand[azure]
pip install cm-cluster-on-demand[gcp]
pip install cm-cluster-on-demand[oci]
```

To install support for multiple providers at once:

```
pip install cm-cluster-on-demand[aws,azure,gcp,oci]
```

# Usage

Each provider has its own CLI entry point. Use `--help` to explore all available options:

```
cm-cod-aws --help
cm-cod-aws cluster create --help
```

Each provider CLI supports the following commands:

| Command | Description |
|---|---|
| `cluster create` | Create a new BCM cluster |
| `cluster list` | List existing clusters |
| `cluster delete` | Delete a cluster and its cloud resources |
| `cluster start` | Start head node instances of a stopped cluster |
| `cluster stop` | Stop all instances of a cluster |
| `image list` | List available BCM head node images |

The following examples each create a cluster with 5 compute nodes and 1 head node.

## AWS

```
cm-cod-aws cluster create \
    --on-error 'cleanup' \
    --aws-region 'us-east-1' \
    --wlm 'slurm' \
    --nodes '5' \
    --aws-access-key-id <AWS_ACCESS_KEY_ID> \
    --aws-secret-key <AWS_SECRET_KEY> \
    --cluster-password <CLUSTER_PASSWORD> \
    --license-product-key <LICENSE_PRODUCT_KEY> \
    --name mycluster
```

## Azure

```
cm-cod-azure cluster create \
    --on-error 'cleanup' \
    --wlm 'slurm' \
    --nodes '5' \
    --cluster-password <CLUSTER_PASSWORD> \
    --license-product-key <LICENSE_PRODUCT_KEY> \
    --name mycluster
```

## GCP

```
cm-cod-gcp cluster create \
    --on-error 'cleanup' \
    --head-node-zone 'europe-west4-c' \
    --wlm 'slurm' \
    --nodes '5' \
    --project-id <PROJECT_ID> \
    --cluster-password <CLUSTER_PASSWORD> \
    --license-product-key <LICENSE_PRODUCT_KEY> \
    --name mycluster
```

## OCI

```
cm-cod-oci cluster create \
    --on-error 'cleanup' \
    --oci-region 'eu-amsterdam-1' \
    --wlm 'slurm' \
    --nodes '5' \
    --oci-tenancy <OCI_TENANCY> \
    --oci-user <OCI_USER> \
    --oci-fingerprint <OCI_FINGERPRINT> \
    --oci-key-file <OCI_KEY_FILE> \
    --cluster-password <CLUSTER_PASSWORD> \
    --license-product-key <LICENSE_PRODUCT_KEY> \
    --name mycluster
```

# Documentation

For full setup and usage documentation, see the
[Cloudbursting Manual](https://docs.nvidia.com/base-command-manager/manuals/11/cloudbursting-manual.pdf).
