Metadata-Version: 2.4
Name: cjm-torch-plugin-utils
Version: 0.0.1
Summary: PyTorch helpers for cjm-plugin-system plugins: GPU memory release, typed CUDA-OOM handling, and device selection.
Author-email: "Christian J. Mills" <9126128+cj-mills@users.noreply.github.com>
License: Apache-2.0
Project-URL: Repository, https://github.com/cj-mills/cjm-torch-plugin-utils
Project-URL: Documentation, https://cj-mills.github.io/cjm-torch-plugin-utils/
Keywords: nbdev
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: cjm-plugin-system>=0.0.37
Requires-Dist: numpy
Requires-Dist: torch
Dynamic: license-file

# cjm-torch-plugin-utils


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## Install

``` bash
pip install cjm_torch_plugin_utils
```

## Project Structure

    nbs/
    ├── device.ipynb # Resolve a device spec ("auto" / "cpu" / "cuda" / "cuda:N") to a concrete torch device string.
    ├── memory.ipynb # Robust move-to-CPU + drop-references + gc + CUDA-cache cleanup for releasing models, factored out of the per-plugin reimplementations.
    └── oom.ipynb    # Convert torch CUDA out-of-memory exceptions into the substrate's typed `PluginResourceError` (SG-47 Track B) so CR-7 reactive retry can evict and reload.

Total: 3 notebooks

## Module Dependencies

``` mermaid
graph LR
    device["device<br/>Device resolution"]
    memory["memory<br/>GPU model release"]
    oom["oom<br/>CUDA OOM handling"]
```

No cross-module dependencies detected.

## CLI Reference

No CLI commands found in this project.

## Module Overview

Detailed documentation for each module in the project:

### Device resolution (`device.ipynb`)

> Resolve a device spec (“auto” / “cpu” / “cuda” / “cuda:N”) to a
> concrete torch device string.

#### Import

``` python
from cjm_torch_plugin_utils.device import (
    resolve_torch_device
)
```

#### Functions

``` python
def resolve_torch_device(
    spec: str = "auto",  # Requested device: "auto", "cpu", "cuda", or "cuda:N"
) -> str:                # Concrete device string
    """
    Resolve a device spec to a concrete torch device string.
    
    `"auto"` resolves to `"cuda"` when CUDA is available, else `"cpu"`. Any
    explicit spec (`"cpu"`, `"cuda"`, `"cuda:0"`, ...) is returned unchanged.
    """
```

### GPU model release (`memory.ipynb`)

> Robust move-to-CPU + drop-references + gc + CUDA-cache cleanup for
> releasing models, factored out of the per-plugin reimplementations.

#### Import

``` python
from cjm_torch_plugin_utils.memory import (
    release_model
)
```

#### Functions

``` python
def release_model(
    obj: Any,                     # The plugin instance holding the model attribute(s)
    model_attr_names: List[str],  # Names of the attributes to release, in release order
    device: str = "cuda",         # Device the model is on; gates the CUDA-specific cleanup
    *,
    logger: logging.Logger,       # Logger for best-effort failure reporting
) -> None
    """
    Release one or more model objects: move to CPU, drop references, gc, free CUDA cache.
    
    For each name in `model_attr_names`, if `obj` has a non-None attribute:
      1. when on CUDA, best-effort `.to('cpu')` (frees GPU tensors; skipped for
         objects without a `.to` method, e.g. processors/tokenizers),
      2. `setattr(obj, name, None)` and drop the local reference.
    Then a single `gc.collect()` and — on CUDA — `empty_cache()` + `synchronize()`.
    
    Best-effort throughout: failures are logged and swallowed. Missing or
    already-None attributes are skipped, so the call is idempotent.
    """
```

### CUDA OOM handling (`oom.ipynb`)

> Convert torch CUDA out-of-memory exceptions into the substrate’s typed
> `PluginResourceError` (SG-47 Track B) so CR-7 reactive retry can evict
> and reload.

#### Import

``` python
from cjm_torch_plugin_utils.oom import (
    cuda_oom_to_plugin_resource_error
)
```

#### Functions

``` python
def cuda_oom_to_plugin_resource_error(
    exc: BaseException,          # The caught CUDA OOM exception (e.g. torch.cuda.OutOfMemoryError)
    *,
    label: str,                  # Context for the message, e.g. "loading model 'X'" or "inference"
    headroom_mb: float = 100.0,  # Best-effort margin added to `available` to estimate `needed`
) -> PluginResourceError:        # Typed error for the substrate's CR-7 reactive-retry path
    """
    Convert a CUDA out-of-memory exception into a substrate-typed `PluginResourceError`.
    
    SG-47 Track B: a plugin's GPU inference / model-load site catches
    `torch.cuda.OutOfMemoryError` and re-raises the result of this helper so the
    substrate sees a typed resource error (evict + reload + retry via CR-7)
    instead of an opaque crash.
    
    `needed` is a best-effort estimate (`available + headroom_mb`): the true
    required VRAM is unknowable from the exception, and CR-7 triggers eviction
    regardless of magnitude, so an approximation above `available` is sufficient.
    
    The caller raises the returned error, preserving the original cause:
    
        try:
            model = Model.from_pretrained(repo_id, ...)
        except torch.cuda.OutOfMemoryError as e:
            raise cuda_oom_to_plugin_resource_error(e, label=f"loading {repo_id!r}") from e
    """
```
