Metadata-Version: 2.4
Name: supermariobrosnes-turbo
Version: 0.2.2
Requires-Dist: gymnasium>=0.29
Requires-Dist: numpy>=1.26
Requires-Dist: huggingface-hub>=1.8.0 ; extra == 'dev'
Requires-Dist: maturin>=1.7,<2 ; extra == 'dev'
Requires-Dist: pytest>=8 ; extra == 'dev'
Requires-Dist: stable-baselines3>=2.7.1 ; extra == 'dev'
Requires-Dist: stable-retro-turbo==1.0.0.post23 ; python_full_version == '3.14.*' and extra == 'dev'
Provides-Extra: dev
Summary: Blazing fast SuperMarioBros-Nes environment for RL research.
License: MIT
Requires-Python: >=3.9
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM

<div align="center">
  <img src="https://raw.githubusercontent.com/tsilva/SuperMarioBros-Nes-turbo/main/logo.png" alt="SuperMarioBros-Nes-turbo logo" width="320" />

  **🚀 Blazing fast SuperMarioBros-Nes environment for RL research 🍄**
</div>

`SuperMarioBros-Nes-turbo` is a blazing-fast vectorized Super Mario Bros NES environment for reinforcement-learning research. It uses a custom Rust NES emulator specialized for SuperMarioBros-Nes mapper 0/NROM, with vectorized stepping on the Rust side so Python crosses into Rust once per batched step. Game-specific preprocessing, including frame skip, grayscale or RGB rendering, cropping, resizing, frame stacking, reward extraction, termination checks, and observation-buffer writes, happens before data returns to Python. It follows the same throughput-first direction as [stable-retro-turbo](https://github.com/tsilva/stable-retro-turbo), but drops broad stable-retro compatibility so the emulator and batch API can specialize on Super Mario Bros NES.

## Why it is fast

Compared with upstream Stable Retro, this package does not run many Python
`RetroEnv` instances through `SubprocVecEnv`, `DummyVecEnv`, or wrapper stacks
for frame skip, resize, grayscale, frame stack, reward, and termination logic.
Compared with `stable-retro-turbo`, it keeps the same native-vector philosophy
but gives up the general Stable Retro compatibility layer, arbitrary game/core
support, and generic emulator contracts. The speed comes from these current fast
paths:

- **SMB/NROM-only Rust emulator**: the core supports the Super Mario Bros NES
  mapper 0/NROM shape directly instead of routing every access through a
  general multi-console emulator interface.
- **Fixed cartridge memory paths**: PRG/CHR reads use precomputed power-of-two
  masks, direct PRG ROM instruction fetches, fixed nametable mirroring, and
  direct CPU memory paths for RAM, PPU registers, controllers, and PRG ROM.
- **One Python call per vector step**: `reset_into()`, `step_into()`, and
  `info_into()` mutate caller-owned NumPy arrays, release the GIL, and avoid
  creating new observations, rewards, done arrays, and scalar info arrays on
  every step.
- **Rust-side batch execution**: vector lanes step in Rust with Rayon when the
  batch is large enough, so the Python side only submits action arrays and reads
  already-filled result buffers.
- **`step_fast()` info bypass**: training and benchmark loops can skip per-env
  Python `info` dictionaries and keep x-position, score, lives, level,
  timer, and scroll values in typed arrays.
- **Fused RL preprocessing**: frame skip, optional max-pool, reward accumulation,
  termination checks, grayscale/RGB rendering, crop, area resize, and frame-stack
  writes happen in the native step loop before data returns to Python.
- **Observation buffer as frame-stack state**: the returned observation buffer is
  also the persistent stack buffer; old frames shift in place and only the newest
  processed frame is written into the final stack slot.
- **Direct grayscale renderer**: the common pixel path renders SMB background
  tile rows and sprite overlays directly to grayscale from NES palette values,
  instead of first materializing RGB and then converting it in Python.
- **Precomputed area resize plan**: resize bins are built once per env
  configuration, then reused for every frame and every lane.
- **Deterministic lane sharing**: identical reset lanes, and repeated saved-state
  groups such as the default `Level1-1` through `Level1-4` round-robin benchmark,
  can share one emulator state while actions remain uniform; mixed actions
  materialize independent lane states before stepping, preserving the public
  vector-env contract.
- **SMB routine fast-forwards**: the emulator recognizes exact Super Mario Bros
  ROM byte signatures for the idle loop, sprite-0 polling loop, and OAM clear
  helper, then advances equivalent CPU/PPU cycles without interpreting every
  repeated 6502 instruction.
- **Rust-side reward and terminal rules**: x-position reward, flag completion,
  life-loss/level-change style `done_on_info` rules, terminal observation
  capture, and autoreset bookkeeping stay in the Rust/Python fast-env boundary
  rather than in wrapper chains.
- **Scoped compatibility paths**: RGB, uncropped rendering, Gymnasium/SB3-style
  `info` dictionaries, terminal observations, sticky actions, random no-op
  starts, and multi-state curricula are still available, but the benchmark path
  keeps them on explicit typed/native routes instead of paying broad Stable Retro
  overhead unconditionally.

## Install

```bash
git clone https://github.com/tsilva/SuperMarioBros-Nes-turbo.git
cd SuperMarioBros-Nes-turbo
uv sync --extra dev
uv run maturin develop --release
```

ROM files are not included in this repository. Pass `--rom-path` to scripts, set `SMB_ROM_PATH`, or provide `rom_path=` when constructing environments. Expected SHA-256 for the supported Super Mario Bros NES ROM:

```text
f61548fdf1670cffefcc4f0b7bdcdd9eaba0c226e3b74f8666071496988248de
```

Import the package as `supermariobrosnes_turbo`:

```python
import numpy as np

from supermariobrosnes_turbo import Actions, SuperMarioBrosNesTurboVecEnv

env = SuperMarioBrosNesTurboVecEnv(
    "SuperMarioBros-Nes-v0",
    rom_path="/path/to/SuperMarioBros.nes",
    num_envs=64,
    use_restricted_actions=Actions.ALL,
    frame_skip=4,
    obs_grayscale=True,
    frame_stack=4,
    obs_crop=(32, 0, 0, 0),
    obs_resize=(84, 84),
    obs_layout="chw",
)

obs = env.reset()
actions = np.zeros((env.num_envs, env.num_buttons), dtype=np.uint8)
env.step_async(actions)
obs, rewards, dones, infos = env.step_wait()
```

`step_wait()` follows the Stable Baselines3 `VecEnv` contract: it calls the Rust `SuperMarioBrosNesTurboVecEnv` once for the whole batch and returns `(obs, rewards, dones, infos)` from reusable NumPy arrays. Use `step_fast()` when you do not need per-env `info` dictionaries, or `step_wait_gymnasium()` when you need separate `terminated` and `truncated` arrays.

Initial states can be a single stable-retro state, one state per env slot, or a weighted mapping sampled independently for each lane on reset:

```python
env = SuperMarioBrosNesTurboVecEnv(
    "SuperMarioBros-Nes-v0",
    rom_path="/path/to/SuperMarioBros.nes",
    num_envs=16,
    state={"Level1-1": 0.5, "Level1-4": 0.5},
    done_on={
        "life_loss": ("lives", "decrease"),
        "level_change": (("levelHi", "levelLo"), "change"),
    },
)
env.seed(123)

obs = env.reset()
sampled_states = env.active_states()
```

## Commands

```bash
uv sync --extra dev                 # install Python dev dependencies
uv run maturin develop --release    # build and install the Rust extension

make test                           # Rust tests + HF policy completion/parity oracle

uv run python scripts/smoke_smb.py --rom-path /path/to/SuperMarioBros.nes  # quick ROM/emulator smoke check
uv run python scripts/benchmark_sps.py --rom-path /path/to/SuperMarioBros.nes --num-envs 16 --steps 500 --repeats 3

uv run python scripts/play.py --rom-path /path/to/SuperMarioBros.nes --mode external      # raw SDL2 play view
uv run python scripts/play.py --rom-path /path/to/SuperMarioBros.nes --mode external --view preprocessed --scale 4
uv run python scripts/play_policy.py https://huggingface.co/tsilva/SuperMarioBros-NES_Level1 --rom-path /path/to/SuperMarioBros.nes
```

## Release

Release tags drive the GitHub Actions wheel build. From a clean, synced branch
with the release environment installed, create the next minor release with:

```bash
uv sync --extra dev --group dev
make release
```

Use `scripts/release.py --part patch`, `--part major`, or `--to 0.2.0` for
other release shapes. The script refuses to run unless the current branch is
clean and synced with its upstream. It verifies the target version is not already
on PyPI, bumps `pyproject.toml` and `Cargo.toml`, refreshes lockfiles, runs local
gates, commits `Release v<version>`, creates the matching tag, and pushes the
branch plus tag. The pushed tag triggers the release workflow, which builds,
audits, and publishes the wheels to PyPI via trusted publishing.

## Fixed-host benchmark target

Use `stable-retro-turbo==1.0.1.post1` as the Stable Retro PyPI oracle for new benchmarks and comparisons. Rerun the PyPI oracle baseline before quoting a current speedup, so the comparison uses the same `SuperMarioBros-Nes-v0` ROM, saved-state set, frame skip, frame stack, grayscale/crop/resize preprocessing, and `16` vector envs on the fixed `beast-3` CPU host.

Historical fixed-host results:

| Environment | Version / Ref | Official median env steps/sec | Mean invocation-median env steps/sec | Run-median CV | Notes |
| --- | --- | ---: | ---: | ---: | --- |
| `SuperMarioBros-Nes-turbo` | `main` | `47,611.14` | `47,605.89` | `0.28%` | Full official fixed-host run; all validity gates passed. |
| `stable-retro-turbo` PyPI oracle | `1.0.0.post23` | `7,437.65` | `7,440.04` | `0.44%` | Historical only; superseded by `1.0.1.post1` for new comparisons. Statistical gates passed, but the post-run host-load gate failed because the 1-minute load was sampled immediately after the benchmark's own CPU-heavy timing. |

Local benchmark artifact paths:

- `artifacts/benchmarks/host-results/host-single-2026-07-02-123806-R17c60e1eb88e/aggregate.json`
- `artifacts/benchmarks/host-results/pypi-stable-retro-turbo/1.0.0.post23/0bcebd32669e8e46/aggregate.json`

## Notes

- Python `>=3.9` and a Rust toolchain are required to build the Maturin extension.
- The current emulator scope is SuperMarioBros-Nes mapper 0 NROM.
- The Python package exposes `SuperMarioBrosNesTurboVecEnv`, `ACTION_MEANINGS`, `CORE_ACTION_MEANINGS`, and `ACTION_SETS`. `SuperMarioBrosNesTurboVecEnv` subclasses Stable Baselines3 `VecEnv` when SB3 is installed and follows the `stable-retro-turbo` `RetroVecEnv` constructor shape.
- `use_restricted_actions=Actions.ALL` and `Actions.FILTERED` consume per-button `MultiBinary` masks; `Actions.DISCRETE` consumes Stable Retro's 36-way discrete action encoding.
- `scripts/play_policy.py` loads Stable Baselines3 PPO checkpoints from a local `.zip`, a Hugging Face repo id, or a `https://huggingface.co/...` URL and displays raw RGB gameplay in the SDL2 GUI while feeding the model its preprocessed observation stack. It defaults to a Stable Retro playback backend so public SB3/Hugging Face checkpoints use the preprocessing they were trained with; pass `--view preprocessed` to inspect the model input or `--backend native` when checking this repo's fast-env parity. The SB3, PyTorch, and Hugging Face Hub dependencies are included in the repo's `uv` dev environment.
- By default, `scripts/benchmark_sps.py` starts lanes from `Level1-1`, `Level1-2`, `Level1-3`, and `Level1-4` repeated round-robin. Use `--state Level1-1` or another packaged stable-retro state to start every lane from one saved level state. This package includes the stable Super Mario Bros NES states from `Level1-1` through `Level8-4`, plus `Level1-1-99lives`, `Level2-1-clouds`, and `Level2-1-clouds-easy`. Use `--states ...` to choose a different round-robin state list. In Python, `state=` accepts a single state name/path/bytes value, a sequence with exactly one state per env, or a weighted mapping such as `{"Level1-1": 0.5, "Level1-4": 0.5}`. After reset, `active_state_indices()` and `active_states()` report the sampled state for each lane. If needed, pass `--state-dir` or set `SUPERMARIOBROSNES_FASTENV_STATE_DIR`.
- For `SuperMarioBrosNesTurboVecEnv`, `done_on_info` accepts named terminal rules like `{"life_loss": ("lives", "decrease")}`. Supported ops are `change`, `increase`, and `decrease`; keys are drawn from `INFO_KEYS`. Fired rules are reported in `info["done_on_info"]` with `op`, `keys`, `prev`, and `next`.
- Stable Retro oracle/playback tooling targets `stable-retro-turbo==1.0.1.post1` for new benchmarks and comparisons, and constructs the upstream vector env with the current flat keyword names: `maxpool_last_two`, `noop_reset_max`, `sticky_action_prob`, `info_filter`, `obs_copy`, and `done_on`. Runtime fired terminal rules are still read from `info["done_on_info"]`.
- Benchmark JSON can be written with `scripts/benchmark_sps.py --output-json ...`.
- Play mode uses the native SDL2 library. If SDL2 is not installed or discoverable, `scripts/play.py` exits with an SDL backend error.
- ROM files are not included in the repository; use the SHA-256 digest above to confirm test inputs when needed.

## Architecture

![SuperMarioBros-Nes-turbo architecture diagram](https://raw.githubusercontent.com/tsilva/SuperMarioBros-Nes-turbo/main/architecture.png)

## License

MIT, as declared in [pyproject.toml](https://github.com/tsilva/SuperMarioBros-Nes-turbo/blob/main/pyproject.toml) and [Cargo.toml](https://github.com/tsilva/SuperMarioBros-Nes-turbo/blob/main/Cargo.toml).

