Metadata-Version: 2.4
Name: torchada
Version: 0.1.0
Summary: Adapter package for torch_musa to act exactly like PyTorch CUDA
Author: torchada contributors
License: MIT
Project-URL: Homepage, https://github.com/yeahdongcn/torchada
Project-URL: Repository, https://github.com/yeahdongcn/torchada
Keywords: pytorch,cuda,musa,moore-threads,gpu,adapter
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch
Provides-Extra: musa
Requires-Dist: torch_musa; extra == "musa"
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: isort; extra == "dev"
Requires-Dist: mypy; extra == "dev"
Dynamic: license-file

# torchada

**Adapter package for torch_musa to act exactly like PyTorch CUDA**

torchada provides a unified interface that works transparently on both NVIDIA GPUs (CUDA) and Moore Threads GPUs (MUSA). Write your code once using standard PyTorch CUDA APIs, and it will run on MUSA hardware without any changes.

## Features

- **Zero Code Changes**: Just `import torchada` once, then use standard `torch.cuda.*` APIs
- **Automatic Platform Detection**: Detects whether you're running on CUDA or MUSA
- **Transparent Device Mapping**: `tensor.cuda()` and `tensor.to("cuda")` work on MUSA
- **Extension Building**: Standard `torch.utils.cpp_extension` works on MUSA after importing torchada
- **Source Code Porting**: Automatic CUDA → MUSA symbol mapping for C++/CUDA extensions

## Installation

```bash
pip install torchada

# Or install from source
git clone https://github.com/yeahdongcn/torchada.git
cd torchada
pip install -e .
```

## Quick Start

### Basic Usage

```python
import torchada  # Import once to apply patches - that's it!
import torch

# Use standard torch.cuda APIs - they work on both CUDA and MUSA:
if torch.cuda.is_available():
    device = torch.device("cuda")
    tensor = torch.randn(10, 10).cuda()
    model = MyModel().cuda()

    # All torch.cuda.* APIs work transparently
    print(f"Device count: {torch.cuda.device_count()}")
    print(f"Device name: {torch.cuda.get_device_name()}")
    torch.cuda.synchronize()
```

### Building C++ Extensions

```python
# setup.py - Use standard torch imports!
import torchada  # Import first to apply patches
from setuptools import setup
from torch.utils.cpp_extension import CUDAExtension, BuildExtension, CUDA_HOME

print(f"Building with CUDA/MUSA home: {CUDA_HOME}")

ext_modules = [
    CUDAExtension(
        name="my_extension",
        sources=[
            "my_extension.cpp",
            "my_extension_kernel.cu",
        ],
        extra_compile_args={
            "cxx": ["-O3"],
            "nvcc": ["-O3"],  # Automatically mapped to mcc on MUSA
        },
    ),
]

setup(
    name="my_package",
    ext_modules=ext_modules,
    cmdclass={"build_ext": BuildExtension.with_options(use_ninja=True)},
)
```

### JIT Compilation

```python
import torchada  # Import first to apply patches
from torch.utils.cpp_extension import load

# Load extension at runtime (works on both CUDA and MUSA)
my_extension = load(
    name="my_extension",
    sources=["my_extension.cpp", "my_extension_kernel.cu"],
    verbose=True,
)
```

### Mixed Precision Training

```python
import torchada  # Import first to apply patches
import torch

model = MyModel().cuda()
optimizer = torch.optim.Adam(model.parameters())
scaler = torch.cuda.amp.GradScaler()

for data, target in dataloader:
    data, target = data.cuda(), target.cuda()

    with torch.cuda.amp.autocast():
        output = model(data)
        loss = criterion(output, target)

    scaler.scale(loss).backward()
    scaler.step(optimizer)
    scaler.update()
    optimizer.zero_grad()
```

### Distributed Training

```python
import torchada  # Import first to apply patches
import torch.distributed as dist

# Use 'nccl' backend as usual - torchada maps it to 'mccl' on MUSA
dist.init_process_group(backend='nccl')
```

### CUDA Graphs

```python
import torchada  # Import first to apply patches
import torch

# Use standard torch.cuda.CUDAGraph - works on MUSA too
g = torch.cuda.CUDAGraph()
with torch.cuda.graph(g):
    y = model(x)
```

## Platform Detection

torchada automatically detects the platform:

```python
import torchada
from torchada import detect_platform, Platform

platform = detect_platform()
if platform == Platform.MUSA:
    print("Running on Moore Threads GPU")
elif platform == Platform.CUDA:
    print("Running on NVIDIA GPU")

# Or use convenience functions
if torchada.is_musa_platform():
    print("MUSA platform detected")
```

## What Gets Patched

After `import torchada`, the following standard PyTorch APIs work on MUSA:

| Standard Import | Works On MUSA |
|----------------|---------------|
| `torch.cuda.*` | ✅ All APIs |
| `torch.cuda.amp.*` | ✅ autocast, GradScaler |
| `torch.cuda.CUDAGraph` | ✅ Maps to MUSAGraph |
| `torch.distributed` (backend='nccl') | ✅ Uses MCCL |
| `torch.utils.cpp_extension.*` | ✅ CUDAExtension, BuildExtension |

## API Reference

### torchada

| Function | Description |
|----------|-------------|
| `detect_platform()` | Returns the detected platform (CUDA, MUSA, or CPU) |
| `is_musa_platform()` | Check if running on MUSA |
| `is_cuda_platform()` | Check if running on CUDA |
| `get_device_name()` | Get device name string ("cuda", "musa", or "cpu") |

### torch.cuda (after importing torchada)

All standard `torch.cuda` APIs work, including:
- `is_available()`, `device_count()`, `current_device()`, `set_device()`
- `memory_allocated()`, `memory_reserved()`, `empty_cache()`
- `synchronize()`, `Stream`, `Event`, `CUDAGraph`
- `amp.autocast()`, `amp.GradScaler()`

### torch.utils.cpp_extension (after importing torchada)

| Symbol | Description |
|--------|-------------|
| `CUDAExtension` | Creates CUDA or MUSA extension based on platform |
| `CppExtension` | Creates C++ extension (no GPU code) |
| `BuildExtension` | Build command for extensions |
| `CUDA_HOME` | Path to CUDA/MUSA installation |
| `load()` | JIT compile and load extension |

## Symbol Mapping

torchada automatically maps CUDA symbols to MUSA equivalents when building extensions:

| CUDA | MUSA |
|------|------|
| `cudaMalloc` | `musaMalloc` |
| `cudaMemcpy` | `musaMemcpy` |
| `cudaStream_t` | `musaStream_t` |
| `cublasHandle_t` | `mublasHandle_t` |
| `curandState` | `murandState` |
| `at::cuda` | `at::musa` |
| `c10::cuda` | `c10::musa` |
| ... | ... |

See `src/torchada/_mapping.py` for the complete mapping table.

## License

MIT License
