Metadata-Version: 2.4
Name: qwerky-vllm-models
Version: 0.2.4
Summary: vLLM plugin for Qwerky AI MambaInLlama hybrid models
Author-email: Qwerky AI <contact@qwerky.ai>
License: Apache-2.0
Project-URL: Homepage, https://github.com/qwerkyai/qwerky-vllm-models
Project-URL: Repository, https://github.com/qwerkyai/qwerky-vllm-models
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: torch>=2.0.0
Requires-Dist: transformers>=4.40.0
Requires-Dist: vllm>=0.14.0
Requires-Dist: einops>=0.7.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"

# Qwerky vLLM Models

A vLLM plugin for serving Qwerky AI's MambaInLlama hybrid models without the `--trust-remote-code` flag.

## Installation

```bash
pip install vllm qwerky-vllm-models
```

## Usage

After installing, serve Qwerky models with vLLM:

```bash
vllm serve QwerkyAI/Qwerky-Llama3.2-Mamba-3B-Llama3.3-70B-base-distill --max-model-len 4096
```

The plugin automatically registers the model architecture with vLLM on import.

## Supported Models

- `QwerkyAI/Qwerky-Llama3.2-Mamba-3B-Llama3.3-70B-base-distill`

## How It Works

This package uses vLLM's plugin system (`vllm.general_plugins` entry point) to register the MambaInLlama model architecture. This means:

- No fork of vLLM required
- No `--trust-remote-code` flag needed
- Works with standard vLLM installation
- Uses vLLM's native Triton-accelerated Mamba kernels

## Requirements

- Python >= 3.10
- vLLM >= 0.14.0
- PyTorch >= 2.0.0

## License

Apache 2.0
