Metadata-Version: 2.4
Name: vllm_plugin_meralion2
Version: 0.3.0
Summary: A vLLM plugin to register the MERaLiON-2-10B model architecture with vLLM’s plugin system.
Author: MERaLiON Team
Project-URL: Modelpage, https://huggingface.co/MERaLiON/MERaLiON-2-10B
Project-URL: Homepage, https://github.com/YingxuH/vllm_plugin
Project-URL: Documentation, https://github.com/YingxuH/vllm_plugin/blob/main/readme.md
Project-URL: Changelog, https://github.com/YingxuH/vllm_plugin/blob/main/CHANGELOG.md
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: librosa
Requires-Dist: packaging

## MERaLiON2 vLLM Plugin

[![Security (Bandit)](https://github.com/YingxuH/vllm_plugin/actions/workflows/security.yml/badge.svg)](https://github.com/YingxuH/vllm_plugin/actions/workflows/security.yml)
[![Dependency Audit (pip-audit)](https://github.com/YingxuH/vllm_plugin/actions/workflows/dependency-audit.yml/badge.svg)](https://github.com/YingxuH/vllm_plugin/actions/workflows/dependency-audit.yml)
[![Dependency Review](https://github.com/YingxuH/vllm_plugin/actions/workflows/dependency-review.yml/badge.svg)](https://github.com/YingxuH/vllm_plugin/actions/workflows/dependency-review.yml)
[![CodeQL](https://github.com/YingxuH/vllm_plugin/actions/workflows/codeql.yml/badge.svg)](https://github.com/YingxuH/vllm_plugin/actions/workflows/codeql.yml)
[![Publish](https://github.com/YingxuH/vllm_plugin/actions/workflows/publish.yml/badge.svg)](https://github.com/YingxuH/vllm_plugin/actions/workflows/publish.yml)


### Licence

[MERaLiON-Public-Licence-v3](https://huggingface.co/datasets/MERaLiON/MERaLiON_Public_Licence/blob/main/MERaLiON-Public-Licence-v3.pdf)

### Set up Environment

This plugin family has three release lines:

- `v0.1.x`: compatibility lane for vLLM version `0.6.5` ~ `0.7.3` (V0 engine), and `0.8.5` ~ `0.8.5.post1` (V1 engine).
- `v0.2.x`: compatibility lane for `vLLM >=0.8.5,<=0.10.0`.
- `v0.3.x`: compatibility lane for `vLLM >=0.12.0,<0.17.0`. Targets the V1 engine and handles all internal API changes across the 0.12–0.16 minor series.

See [scripts/compatibility/](https://github.com/YingxuH/vllm_plugin/tree/main/scripts/compatibility) for the automated version-matrix runner and detailed vLLM + transformers compatibility results.

Install by your vLLM version:

```bash
# For vLLM 0.6.5~0.7.3, 0.8.5.
pip install "vllm-plugin-meralion2<0.2"

# For vLLM 0.8.5 ~ 0.10.0
pip install "vllm-plugin-meralion2>=0.2,<0.3"

# For vLLM 0.12.0 ~ 0.16.x
pip install "vllm-plugin-meralion2>=0.3,<0.4"
```

**Attention backend:** MERaLiON-2 uses Gemma2's attention logit softcapping which requires the FlashInfer backend. The serve example handles this automatically. See [openai_serve_example.sh](https://github.com/YingxuH/vllm_plugin/blob/main/example_scripts/openai_serve_example.sh) for details.

### Offline Inference

Refer to [offline_example.py](https://github.com/YingxuH/vllm_plugin/blob/main/example_scripts/offline_example.py) for offline inference example.

### OpenAI-compatible Serving

Refer to [openai_serve_example.sh](https://github.com/YingxuH/vllm_plugin/blob/main/example_scripts/openai_serve_example.sh) for OpenAI-compatible serving example.

To call the server, you can refer to [openai_client_example.py](https://github.com/YingxuH/vllm_plugin/blob/main/example_scripts/openai_client_example.py).

Alternatively, you can try calling the server with curl, refer to [openai_client_curl.sh](https://github.com/YingxuH/vllm_plugin/blob/main/example_scripts/openai_client_curl.sh).

### Full release history 

See [CHANGELOG.md](https://github.com/YingxuH/vllm_plugin/blob/main/CHANGELOG.md).


### vLLM + transformers compatibility (v0.3.x)

Tested with `transformers==4.57.6` on H100 (TP=1). Each cell covers install, unit tests, and full-dataset ASR evaluation.

| vLLM | transformers | install | tests | ASR eval | overall |
|------|-------------|---------|-------|----------|---------|
| 0.12.0 | 4.57.6 | PASS | PASS | PASS | PASS |
| 0.13.0 | 4.57.6 | PASS | PASS | PASS | PASS |
| 0.14.0 | 4.57.6 | PASS | PASS | PASS | PASS |
| 0.15.0 | 4.57.6 | PASS | PASS | PASS | PASS |
| 0.15.1 | 4.57.6 | PASS | PASS | PASS | PASS |
| 0.16.0 | 4.57.6 | PASS | PASS | PASS | PASS |

To reproduce or extend this matrix, see the [compatibility matrix runner](https://github.com/YingxuH/vllm_plugin/tree/main/scripts/compatibility).

### Security and dependency scanning

The repository uses separate workflows so each scan has a clear purpose:

- `Security (Bandit SAST)` (`.github/workflows/security.yml`): static security linting of project Python source (`bandit -r src`).
- `CodeQL` (`.github/workflows/codeql.yml`): semantic code scanning for Python + GitHub Actions security issues.
- `Dependency Audit (pip-audit)` (`.github/workflows/dependency-audit.yml`): installed dependency vulnerability scanning.
- `Dependency Review (PR)` (`.github/workflows/dependency-review.yml`): checks dependency changes in pull requests and fails on `moderate`+ severity vulnerabilities.
