Metadata-Version: 2.3
Name: torch-rbln
Version: 0.1.7
Summary: Rebellions Extension for PyTorch
Author: Rebellions Inc.
Author-email: client_support@rebellions.ai
Requires-Python: >=3.10,<3.14
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Dist: PyYAML (>=6,<7)
Requires-Dist: libcst (>=1.2.0)
Requires-Dist: rebel-compiler (>=0.10.1,<0.20.0)
Requires-Dist: scipy (>=1.14.0)
Requires-Dist: torch (==2.9.0+cpu)
Description-Content-Type: text/markdown

# PyTorch for Rebellions' NPU

This package provides PyTorch integration for Rebellions' NPU.

## Getting Started (`torch`: Python package, `rebel-compiler`: Python package)

### Prerequisites

- Python 3.9 or later
- Git
- CMake 3.18 or later
- Ninja build system
- LDAP credentials for Rebellions' package repository

### Update Git submodules

Clone submodules recursively. This will download required third-party libraries
such as Rebel Compiler headers from `third_party/rebel_compiler`.

    git submodule update --init ./

### Create Python Virtual Environment

Create Python virtual environment. This will create a directory named `.venv` in
the current directory.

    python3 -m venv ./.venv && source ./.venv/bin/activate

### Install Dependencies

Install Python package manager `poetry`. This will manage dependencies,
building, packaging and installing.

    pip3 install poetry==2.0.1

Save credentials for `https://gate-keeper.rebellions.in`. Authorization for
`https://pypi.rbln.in` is required. But, this is not safe way to save
credentials. See `~/.config/pypoetry/auth.toml`.

    export LDAP_USERNAME=daekyeong.kim     # Put your username
    export LDAP_PASSWORD=mysecretpassword  # Put your password
    poetry config keyring.enabled false    # Optional, if building freezes while auth
    poetry config http-basic.rbln-internal $LDAP_USERNAME $LDAP_PASSWORD

**NOTE** During development we have to use rbln-internal instead of rbln. If you
want to download rebel compiler from rbln (external pypi server of rebellions), do
the following.

    poetry config http-basic.rbln <rbln username> <rbln password>

Install dependencies written in `poetry.lock` using `poetry`, except the root
package `torch-rbln`. Be careful, below command uninstall packages which is not
on `poetry.lock`.

    poetry sync --no-root

### Choose Build Type (Optional)

Choose build type like below. Default is `Release`.

    export RBLN_BUILD_TYPE=Debug

### Install Editable Package

Build C++ project and install editable `torch-rbln` package.

    poetry install --only-root


### Logging

`torch-rbln` provides structured logging via `spdlog` to help diagnose runtime behavior, including CPU fallback operations and device execution traces.

#### Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `TORCH_RBLN_LOG_LEVEL` | Controls log verbosity | `WARNING` |
| `TORCH_RBLN_LOG_PATH` | Log file path (debug builds only) | `./torch_rbln.log` |

```bash
export TORCH_RBLN_LOG_LEVEL=INFO
export TORCH_RBLN_LOG_PATH=./torch_rbln.log
```

A log file is always created in debug builds. Its path can be configured via `TORCH_RBLN_LOG_PATH` environment variable.

#### Log Levels

| Level | Description | Use Case |
|-------|---------|----------|
| `DEBUG` | Detailed internal states, function entry/exit, parameter values | Deep debugging during development (debug builds only) |
| `INFO` | Runtime information, CPU fallback notifications | General development and troubleshooting |
| `WARNING` (default) | Important warnings that may affect execution | Production monitoring |
| `ERROR` | Errors and critical failures | Error tracking and alerting |

#### Debug vs Release Builds

| Feature | Debug Build | Release Build |
|---------|-------------|---------------|
| Minimum log level | `DEBUG` | `INFO` |
| Log file | ✅ Written to `TORCH_RBLN_LOG_PATH` | ❌ Not available |
| Source location | ✅ Included | ❌ Omitted |
| Thread ID | ✅ Included | ❌ Omitted |


### Performance Optimization Flag (Optional)

To reduce runtime overhead (e.g., skipping unnecessary NaN/Inf checks), set the following environment variable:

```bash
export TORCH_RBLN_DEPLOY=ON
```
This enables lightweight execution for deployment scenarios.

### Device Mapping Configuration

By default, each physical NPU device is mapped to a logical device with a 1:1 relationship (equivalent to `RBLN_NPUS_PER_DEVICE=1`). This is called **Direct Mapping** and provides the standard PyTorch device usage experience.

You can configure device mapping using the following environment variables to enable **Aggregated Mapping**, which groups multiple physical NPUs into a single logical device for RSD (Rebellions Scalable Design) functionality.

#### RBLN_NPUS_PER_DEVICE

Groups physical NPUs together to create logical devices. Each logical device will contain the specified number of physical NPUs. This is designed for **Normal Users** who want simple configuration.

**Constraint**: Must be one of the supported sizes: `1`, `2`, `4`, `8`, `16`, or `32`. These values match the `base_sizes` defined in `rebel/core/compilation/_impl.py` for production environments.

```bash
export RBLN_NPUS_PER_DEVICE=2
```

**Examples:**

With 4 physical devices (`RBLN_DEVICES=0,1,2,3` or default):
- `RBLN_NPUS_PER_DEVICE=2` → `rbln:0` maps to NPUs [0, 1], `rbln:1` maps to NPUs [2, 3]
- `RBLN_NPUS_PER_DEVICE=4` → `rbln:0` maps to NPUs [0, 1, 2, 3] (full aggregation)

With 6 physical devices and `RBLN_NPUS_PER_DEVICE=4`:
- `rbln:0` maps to NPUs [0, 1, 2, 3]
- NPUs [4, 5] remain unused (warning will be displayed)

#### RBLN_DEVICE_MAP

Provides explicit mapping between logical devices and physical NPU IDs. This is designed for **Advanced Users** who need fine-grained control over device topology.

**Constraint**: Each device group must contain one of the supported sizes: `1`, `2`, `4`, `8`, `16`, or `32` devices.

```bash
export RBLN_DEVICE_MAP="[0,1],[2,3,4,5]"
```

**Format:** Comma-separated groups of NPU IDs, each group enclosed in square brackets.

**Example:** With 6 physical devices:
- `RBLN_DEVICE_MAP="[0,1],[2,3,4,5]"` → `rbln:0` maps to NPUs [0, 1], `rbln:1` maps to NPUs [2, 3, 4, 5]

#### Configuration Priority and Conflict Resolution

**Priority order:** `RBLN_DEVICE_MAP` > `RBLN_NPUS_PER_DEVICE` > default (1:1 mapping)


#### Viewing Device Topology

You can view the current device topology using `torch.rbln.device_summary()`:

```python
import torch_rbln
torch.rbln.device_summary()
```

**Example output:**

```
[RBLN] Device Topology Initialized:
+-------------------+-------------------+----------------------+
| Logical Device    | Physical NPU IDs  | Status               |
+-------------------+-------------------+----------------------+
| rbln:0            | [ 0, 1 ]          | Active (Aggregated)  |
| rbln:1            | [ 2, 3 ]          | Active (Aggregated)  |
+-------------------+-------------------+----------------------+
```

### Tensor Parallel Configuration

The following environment variables control tensor parallel behavior for `torch.compile` operations and eager mode ops.

#### TORCH_RBLN_USE_TP_FAILOVER

Enables automatic tensor parallel failover. When a RuntimeError occurs during execution with `tensor_parallel_size > 1`, the system automatically retries with `tp_size=1` on the root NPU of the device group.

This is useful for models that don't support tensor parallelism, allowing them to run on a single NPU within an aggregated device group without manual intervention.

```bash
export TORCH_RBLN_USE_TP_FAILOVER=ON   # enable
export TORCH_RBLN_USE_TP_FAILOVER=OFF  # disable (default: OFF)
```

**Behavior:**
- When set to ON and a RuntimeError occurs with `tp > 1`:
  1. The system logs a warning message indicating the failover attempt
  2. The model is recompiled with `tensor_parallel_size=1`
  3. Execution continues on the root NPU of the device group
- When set to OFF or unset (default), RuntimeErrors are propagated as-is

**Example scenario:**
With `RBLN_NPUS_PER_DEVICE=4` (4 NPUs per logical device):
- Initial compilation attempts `tp=4`
- If the model doesn't support TP, a RuntimeError occurs
- With failover enabled, the system retries with `tp=1` on NPU 0

#### TORCH_RBLN_USE_DEVICE_TP

Controls whether eager mode operations use the device group's tensor parallel size instead of `tp_size=1`.

By default, eager mode ops (operations outside of `torch.compile`) use `tp_size=1`. When this environment variable is set to ON, eager mode ops will follow the logical device size defined by `RBLN_NPUS_PER_DEVICE` or `RBLN_DEVICE_MAP`, matching the behavior of `torch.compile` operations.

```bash
export TORCH_RBLN_USE_DEVICE_TP=ON   # use device group tp size
export TORCH_RBLN_USE_DEVICE_TP=OFF  # use tp_size=1 for eager ops (default: OFF)
```

**Behavior:**
- When set to ON: Eager mode ops use the device group's tensor parallel size (e.g., `tp=4` with `RBLN_NPUS_PER_DEVICE=4`)
- When set to OFF or unset (default): Eager mode ops use `tp_size=1`

**Use case:**
This is useful when you want consistent tensor parallel behavior across both eager and compiled operations, particularly in mixed execution scenarios.


### Install Wheel Package (Optional)

If you want to make `*.whl` and install that, run below command.

    poetry build
    pip install ./dist/torch_rbln*.whl

When you change C++ or Python source code, you just run
`Install Editable Package` or `Install Wheel Package` again.


## Apply Custom `rebel-compiler`

You have 2 choices:
- Use built-in one
- Use external one

### Use `torch-rbln` built-in `rebel-compiler` (`torch`: Python package, `rebel-compiler`: `third_party/rebel_compiler`)

This way is strongly recommended. Those are same with `Getting Started`.

    git submodule update --init ./
    python3 -m venv ./.venv && source ./.venv/bin/activate
    pip3 install poetry==2.0.1
    export LDAP_USERNAME=daekyeong.kim     # Put your username
    export LDAP_PASSWORD=mysecretpassword  # Put your password
    poetry config http-basic.rbln $LDAP_USERNAME $LDAP_PASSWORD

Without `poetry sync`, checkout `rebel-compiler` where
`./third_party/rebel_compiler` to your custom branch.

    pushd ./third_party/rebel_compiler
      git checkout my_custom_branch
    popd

It will make a package and install into your environment with syncing.

    ./tools/apply-custom-rebel.sh

Above script edits `pyproject.toml` and `poetry.lock` files. If you want to
apply custom `rebel-compiler` temporarily, keep your eyes on those files.

(Optional) You can choose build type like below.

    RBLN_BUILD_TYPE=Debug ./tools/apply-custom-rebel.sh

Then, you can build or install `torch-rbln` package on the custom
`rebel-compiler` package.

    poetry install --only-root

### Use external `rebel-compiler` (for rebel-compiler developers)

**Prereqs**
* You’ve already built `rebel-compiler`.
* `${REBEL_HOME}` points to the `rebel-compiler` repo root.

#### Method 1: Automated Script (Recommended)

> **⚠️ Warning:** Do not use `build-with-external-rebel.sh` together with `apply-custom-rebel.sh`.
> Both scripts modify `pyproject.toml` and may cause environment conflicts.
> Use only one method at a time.

Use the `build-with-external-rebel.sh` script for automated build:

**gcc-13 mode (default):** Uses PyTorch from PyPI
```bash
cd /path/to/torch-rbln
export REBEL_HOME=/path/to/rebel_compiler
./tools/build-with-external-rebel.sh --clean
```

**gcc-12 mode:** Requires pre-built torch wheel
```bash
cd /path/to/torch-rbln
export REBEL_HOME=/path/to/rebel_compiler
export RBLN_GCC_VERSION=12
export TORCH_WHEEL_PATH=/path/to/torch-2.8.0-cp310-cp310-linux_x86_64.whl
./tools/build-with-external-rebel.sh --clean
```

**Options:**
- `--clean`: Clean build artifacts before building
- `--clean-only`: Only clean build artifacts, do not build

**Environment Variables:**
- `REBEL_HOME`: Path to rebel-compiler (REQUIRED)
- `RBLN_GCC_VERSION`: GCC version to use (`12` or `13`, default: `13`)
- `TORCH_WHEEL_PATH`: Path to pre-built torch wheel (REQUIRED for gcc-12, ignored for gcc-13)
- `RBLN_BUILD_TYPE`: Build type (`Release` or `Debug`, default: `Release`)
- `RBLN_VENV_PATH`: Virtual environment path (default: `.venv-rebel`)

The script will:
1. Check Python version compatibility with rebel-compiler
2. Create virtual environment
3. Install dependencies
4. Configure pyproject.toml for external rebel-compiler
5. Build and install torch-rbln
6. Verify installation with import tests

**After build:**
```bash
source .venv-rebel/bin/activate
# activate_rebel is auto-sourced, setting REBEL_HOME, PYTHONPATH, LD_LIBRARY_PATH
python -c "import torch; import rebel; import torch_rbln; print('OK')"
```

#### Method 2: Manual Setup

#### 1) Create and activate a virtualenv

```bash
python3 -m venv .venv
source .venv/bin/activate
```

#### 2) Add your local `rebel-compiler` in editable mode

```bash
poetry add --editable "${REBEL_HOME}/python"
```

#### 3) Install this project, using the external compiler

```bash
RBLN_USE_EXTERNAL_REBEL_COMPILER=1 poetry install --only-root
```


## Create Git Commit

Git `pre-commit` hook is working. So, when you create Git commit, linting
would be triggered. For prepare linting, you MUST initialize `lintrunner`.

    source ./.venv/bin/activate

    lintrunner init

Once `lintrunner` initalized, no need to initialize again. You can commit now.

    git commit

Some failures can be fixed automatically. Run below command for auto fixing.

    lintrunner -m main -a


## Run Tests

Assume that you are in Python virtual environment, and install `torch-rbln`
package successfully.

### C++ Tests

Making package runs in new isolated environment. Although you build your C++
project using `poetry install --only-root`, can't find that directory.
So, for `CTest` you MUST build C++ project manually.

    ./tools/build-libtorch-rbln.sh

    ctest --test-dir ./build


### Python Tests

    pytest ./test
--------------------------------------------------------------------------------

