Metadata-Version: 2.4
Name: brr-cli
Version: 0.8.0
Summary: Research infrastructure management tooling.
Project-URL: Homepage, https://github.com/joncarter1/brr
Project-URL: Repository, https://github.com/joncarter1/brr
Project-URL: Issues, https://github.com/joncarter1/brr/issues
Author: Jon Carter
License-Expression: MIT
License-File: LICENSE
Keywords: aws,cloud,cluster,gpu,infrastructure,ray
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: System :: Clustering
Requires-Python: >=3.11
Requires-Dist: boto3
Requires-Dist: click
Requires-Dist: inquirerpy
Requires-Dist: nebius
Requires-Dist: pyyaml
Requires-Dist: ray[default]
Requires-Dist: rich
Description-Content-Type: text/markdown

# ❄️ brr ❄️

Opinionated research infrastructure tooling. Launch clusters, get SSH access, start building.

## Features
- **Shared filesystem** — Nodes can share `$HOME` via EFS (AWS) or virtiofs (Nebius).
- **Coding tools** — Install Claude Code, Codex, or Gemini. Connect with e.g. `brr attach aws:cluster claude`
- **Autoscaling** — Ray-based cluster scaling with cached instances.
- **Project-based workflows** — Per-repo cluster configs and project-specific dependencies.
- **Auto-shutdown** — Monitors CPU, GPU, and SSH activity. Shuts down idle instances to save costs.
- **Dotfiles integration** — Take your dev environment (vim, tmux, shell config) to every cluster node.

## Prerequisites

- [uv](https://docs.astral.sh/uv/) (for installation)

## Quick Start

```sh
# Install
uv tool install brr-cli

# Configure (interactive wizard)
brr configure      # or: brr configure nebius

# Launch a GPU instance
brr up aws:l4

# brr up nebius:h100

# Connect
brr attach aws:l4                # SSH
brr attach aws:l4 claude         # Claude Code on the cluster
brr vscode aws:l4                # VS Code remote
```

All templates use `provider:name` syntax (e.g. `aws:l4`, `aws:dev`). Inside a [project](#projects), project templates are resolved first.

Supported clouds: [AWS](#aws-setup) · [Nebius](#nebius-setup)

## Projects

For per-repo cluster configs, initialize a project:

```sh
cd my-research-repo/
brr init
```

This creates:

```
.brr/
  setup.sh          # Project-specific dependencies (shared across providers)
  aws/
    dev.yaml        # Single GPU for development
    cluster.yaml    # CPU head + GPU workers
```

Templates are Ray cluster YAML — edit them or add your own. Inside a project:

```sh
brr up aws:dev          # launches .brr/aws/dev.yaml
brr up aws:cluster      # launches .brr/aws/cluster.yaml
brr attach aws:dev      # SSH into dev cluster
brr down aws:dev        # tear down
```

On first deploy, `brr up` clones the project repo to `~/code/{repo}/` on the head node.

If your project uses `uv`, `brr init` generates templates that use `uv run ray start` from your project directory. Add cluster dependencies to your project first: `uv add 'ray[default]' boto3`.

All global config lives in `~/.brr/config.env`.

## Templates

See [docs/templates.md](docs/templates.md) for the full template reference (placeholders, injection, overrides, Nebius fields).

### Built-in templates

| Template | Instance | GPU | Workers |
| :--- | :--- | :--- | :--- |
| `aws:cpu` | t3.2xlarge | — | 0-2 |
| `aws:l4` | gr6.4xlarge | 1x L4 | — |
| `aws:h100` | p5.4xlarge | 1x H100 | — |
| `aws:cpu-l4` | t3.2xlarge + g6.4xlarge | 1x L4 | 0-4 |
| `nebius:cpu` | 8vcpu-32gb | — | 0-2 |
| `nebius:h100` | 1gpu-16vcpu-200gb | 1x H100 | — |
| `nebius:cpu-h100s` | 8vcpu-32gb + 8gpu-128vcpu-1600gb | 8x H100 | 0-4 |

### Overrides

Override template values inline:

```sh
brr up aws:cpu instance_type=t3.xlarge max_workers=4
brr up aws:l4 spot=true
brr up aws:dev region=us-west-2
```

Preview the rendered config without launching:

```sh
brr up aws:dev --dry-run
```

See available overrides for a template:

```sh
brr templates show aws:dev
```

### Multi-provider

Both providers can run simultaneously:

```sh
brr up aws:l4
brr up nebius:h100
brr attach nebius:h100
brr down nebius:h100
```

## Customization

### Node setup

The built-in `setup.sh` runs on every node boot. It installs packages, mounts shared storage, sets up Python/Ray, GitHub SSH keys, AI coding tools, dotfiles, and the idle shutdown daemon. It updates automatically when you upgrade brr.

Project-specific dependencies go in `.brr/setup.sh` (created by `brr init`), which runs after the global setup.

### uv integration

brr wraps the `uv` binary to route virtual environments away from the shared EFS home directory:

| Environment variable | Value | Purpose |
| :--- | :--- | :--- |
| `UV_CACHE_DIR` | `/tmp/uv` | Download cache (per-instance) |
| `UV_PYTHON_INSTALL_DIR` | `/tmp/uv/python` | Managed Python builds (per-instance) |
| `UV_PROJECT_ENVIRONMENT` | `/tmp/venvs/{project}` | Project venvs (per-instance) |

The wrapper lives at `~/.local/bin/uv` and delegates to the real binary at `~/.local/lib/uv`. Both persist on EFS so new instances reuse them without reinstalling. Only caches, Python builds, and venvs are per-instance (rebuilt on boot from lockfiles).

For uv-managed projects, Ray runs via `uv run ray start` from the project directory — add `ray[default]` and your cloud SDK (e.g. `boto3`) to your project's dependencies. For non-uv clusters, Ray runs from a standalone venv at `/tmp/brr/venv`.

### AI coding tools

Install AI coding assistants on every cluster node:

```sh
brr configure tools    # select Claude Code, Codex, and/or Gemini CLI
```

Then connect and start coding:

```sh
brr up aws:dev
brr attach aws:dev claude
```

### Dotfiles

Set a dotfiles repo to sync your dev environment to every node:

```sh
brr config set DOTFILES_REPO "https://github.com/user/dotfiles"
```

The repo is cloned to `~/dotfiles` and installed via `install.sh` (if present) or GNU Stow.

### Image baking

Bake the global setup into AMIs/images for fast boot:

```sh
brr bake aws          # bake both CPU + GPU AMIs
brr bake status       # check if baked images are up to date
```

After baking, clusters boot from the pre-built image. Only project-specific deps need to install. `brr up` warns when setup.sh has changed since the last bake.

### Idle shutdown

A systemd daemon monitors CPU, GPU, and SSH activity. When all signals are idle for the configured timeout, the instance shuts down.

Configure in `~/.brr/config.env`:

```
IDLE_SHUTDOWN_ENABLED="true"
IDLE_SHUTDOWN_TIMEOUT_MIN="30"
IDLE_SHUTDOWN_CPU_THRESHOLD="10"
IDLE_SHUTDOWN_GRACE_MIN="15"
```

The grace period prevents shutdown during initial setup. Monitor on a node with `journalctl -u idle-shutdown -f`.

### Node caching

By default, Nebius nodes are **deleted** on scale-down. Unlike AWS, stopped Nebius instances still incur disk charges, so deleting is cheaper.

To keep nodes stopped instead (faster restart, but you pay for disks while idle), enable caching in your template's provider config:

```yaml
provider:
  cache_stopped_nodes: true
```

AWS nodes are cached (stopped) by default.

## Commands

| Command | Description |
| :--- | :--- |
| `brr up TEMPLATE [OVERRIDES...]` | Launch or update a cluster (`aws:l4`, `aws:dev`, or `path.yaml`) |
| `brr up TEMPLATE --dry-run` | Preview rendered config without launching |
| `brr down TEMPLATE` | Stop a cluster (instances preserved for fast restart) |
| `brr down TEMPLATE --delete` | Terminate all instances and remove staging files |
| `brr attach TEMPLATE [COMMAND]` | SSH into head node, optionally run a command (e.g. `claude`) |
| `brr list [--all]` | List clusters (project-scoped by default, `--all` for everything) |
| `brr clean [TEMPLATE]` | Terminate stopped (cached) instances |
| `brr vscode TEMPLATE` | Open VS Code on a running cluster |
| `brr templates list` | List built-in templates |
| `brr templates show TEMPLATE` | Show template config and overrides |
| `brr init` | Initialize a project (interactive provider selection) |
| `brr configure [cloud\|tools\|general]` | Interactive setup (cloud provider, AI tools, settings) |
| `brr config [list\|get\|set\|path]` | View and manage configuration |
| `brr bake [aws\|nebius]` | Bake setup into cloud images |
| `brr bake status` | Check if baked images are up to date |
| `brr completion [bash\|zsh\|fish]` | Shell completion (`--install` to add to shell rc) |
| `brr nuke [aws\|nebius]` | Tear down all cloud resources |

## Cloud Setup

### AWS Setup

1. Attach the [IAM policy](brr/aws/iam-policy.json) to your IAM user
2. Install the [AWS CLI](https://aws.amazon.com/cli/) and run `aws configure`
3. *(Optional)* For GitHub SSH access on clusters, authenticate the [GitHub CLI](https://cli.github.com/):
   ```sh
   gh auth login
   gh auth refresh -h github.com -s admin:public_key
   ```
4. Run the setup wizard:
   ```sh
   brr configure aws
   ```

### Nebius Setup

1. Install the [Nebius CLI](https://docs.nebius.com/cli/install) and run `nebius init`
2. Create a service account with editor permissions:
   ```sh
   TENANT_ID="<your-tenant-id>"  # from console.nebius.com → Administration

   SA_ID=$(nebius iam service-account create \
     --name brr-cluster --format json | jq -r '.metadata.id')

   EDITORS_GROUP_ID=$(nebius iam group get-by-name \
     --name editors --parent-id $TENANT_ID --format json | jq -r '.metadata.id')

   nebius iam group-membership create \
     --parent-id $EDITORS_GROUP_ID --member-id $SA_ID
   ```
3. Generate credentials:
   ```sh
   mkdir -p ~/.nebius
   nebius iam auth-public-key generate \
     --service-account-id $SA_ID --output ~/.nebius/credentials.json
   ```
4. Run the setup wizard:
   ```sh
   brr configure nebius
   ```

## Acknowledgments

This project started as a fork of [aws_wiz](https://github.com/besarthoxhaj/aws_wiz) by [Bes](https://github.com/besarthoxhaj) and has been inspired by discussions with colleagues from the [Encode: AI for Science Fellowship](https://encode.pillar.vc/).
