Metadata-Version: 2.4
Name: vllm-cpu-avx512vnni
Version: 0.10.0.post2
Summary: vLLM CPU inference engine (AVX512 + VNNI optimized)
Author-email: Mekayel Anik <mekayel.anik@gmail.com>
Maintainer-email: Mekayel Anik <mekayel.anik@gmail.com>
License: GPL-3.0-only
Project-URL: Bug Tracker, https://github.com/MekayelAnik/vllm-cpu/issues
Project-URL: Homepage, https://github.com/vllm-project/vllm
Project-URL: Documentation, https://docs.vllm.ai/en/latest/
Project-URL: Slack, https://slack.vllm.ai/
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: <3.13,>=3.9
Description-Content-Type: text/markdown
Requires-Dist: regex
Requires-Dist: cachetools
Requires-Dist: psutil
Requires-Dist: sentencepiece
Requires-Dist: numpy
Requires-Dist: requests>=2.26.0
Requires-Dist: tqdm
Requires-Dist: blake3
Requires-Dist: py-cpuinfo
Requires-Dist: transformers>=4.53.2
Requires-Dist: huggingface-hub[hf_xet]>=0.33.0
Requires-Dist: tokenizers>=0.21.1
Requires-Dist: protobuf
Requires-Dist: fastapi[standard]>=0.115.0
Requires-Dist: aiohttp
Requires-Dist: openai<=1.90.0,>=1.87.0
Requires-Dist: pydantic>=2.10
Requires-Dist: prometheus_client>=0.18.0
Requires-Dist: pillow
Requires-Dist: prometheus-fastapi-instrumentator>=7.0.0
Requires-Dist: tiktoken>=0.6.0
Requires-Dist: lm-format-enforcer<0.11,>=0.10.11
Requires-Dist: llguidance<0.8.0,>=0.7.11; platform_machine == "x86_64" or platform_machine == "arm64" or platform_machine == "aarch64"
Requires-Dist: outlines_core==0.2.10
Requires-Dist: diskcache==5.6.3
Requires-Dist: lark==1.2.2
Requires-Dist: xgrammar==0.1.21; platform_machine == "x86_64" or platform_machine == "aarch64" or platform_machine == "arm64"
Requires-Dist: typing_extensions>=4.10
Requires-Dist: filelock>=3.16.1
Requires-Dist: partial-json-parser
Requires-Dist: pyzmq>=25.0.0
Requires-Dist: msgspec
Requires-Dist: gguf>=0.13.0
Requires-Dist: importlib_metadata; python_version < "3.10"
Requires-Dist: mistral_common[audio,image]>=1.8.2
Requires-Dist: opencv-python-headless>=4.11.0
Requires-Dist: pyyaml
Requires-Dist: six>=1.16.0; python_version > "3.11"
Requires-Dist: setuptools<80,>=77.0.3; python_version > "3.11"
Requires-Dist: einops
Requires-Dist: compressed-tensors==0.10.2
Requires-Dist: depyf==0.19.0
Requires-Dist: cloudpickle
Requires-Dist: watchfiles
Requires-Dist: python-json-logger
Requires-Dist: scipy
Requires-Dist: ninja
Requires-Dist: pybase64
Requires-Dist: cbor2
Requires-Dist: numba==0.60.0; python_version == "3.9"
Requires-Dist: numba==0.61.2; python_version > "3.9"
Requires-Dist: packaging>=24.2
Requires-Dist: setuptools<80.0.0,>=77.0.3
Requires-Dist: torch==2.6.0+cpu; platform_machine == "x86_64"
Requires-Dist: torch==2.7.0; platform_system == "Darwin"
Requires-Dist: torch==2.7.0; platform_machine == "ppc64le" or platform_machine == "aarch64"
Requires-Dist: torchaudio; platform_machine != "ppc64le" and platform_machine != "s390x"
Requires-Dist: torchaudio==2.7.0; platform_machine == "ppc64le"
Requires-Dist: torchvision; platform_machine != "ppc64le" and platform_machine != "s390x"
Requires-Dist: torchvision==0.22.0; platform_machine == "ppc64le"
Requires-Dist: datasets
Requires-Dist: intel-openmp==2024.2.1; platform_machine == "x86_64"
Requires-Dist: intel_extension_for_pytorch==2.6.0; platform_machine == "x86_64"
Requires-Dist: triton==3.2.0; platform_machine == "x86_64"
Provides-Extra: bench
Requires-Dist: pandas; extra == "bench"
Requires-Dist: datasets; extra == "bench"
Provides-Extra: tensorizer
Requires-Dist: tensorizer==2.10.1; extra == "tensorizer"
Provides-Extra: fastsafetensors
Requires-Dist: fastsafetensors>=0.1.10; extra == "fastsafetensors"
Provides-Extra: runai
Requires-Dist: runai-model-streamer>=0.13.3; extra == "runai"
Requires-Dist: runai-model-streamer-s3; extra == "runai"
Requires-Dist: boto3; extra == "runai"
Provides-Extra: audio
Requires-Dist: librosa; extra == "audio"
Requires-Dist: soundfile; extra == "audio"
Requires-Dist: mistral_common[audio]; extra == "audio"
Provides-Extra: video
Dynamic: provides-extra
Dynamic: requires-dist

<!-- markdownlint-disable MD001 MD041 -->
<p align="center">
  <picture>
    <source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/vllm-project/vllm/main/docs/assets/logos/vllm-logo-text-dark.png">
    <img alt="vLLM" src="https://raw.githubusercontent.com/vllm-project/vllm/main/docs/assets/logos/vllm-logo-text-light.png" width=55%>
  </picture>
</p>

<h3 align="center">
Easy, fast, and cheap LLM serving for everyone
</h3>

<p align="center">
  <a href="https://github.com/MekayelAnik/vllm-cpu/stargazers">
    <img src="https://img.shields.io/github/stars/MekayelAnik/vllm-cpu?style=for-the-badge&logo=github&logoColor=white&labelColor=2b3137&color=f0c14b" alt="GitHub Stars">
  </a>
  <a href="https://github.com/MekayelAnik/vllm-cpu/network/members">
    <img src="https://img.shields.io/github/forks/MekayelAnik/vllm-cpu?style=for-the-badge&logo=github&logoColor=white&labelColor=2b3137&color=6cc644" alt="GitHub Forks">
  </a>
  <a href="https://github.com/MekayelAnik/vllm-cpu/issues">
    <img src="https://img.shields.io/github/issues/MekayelAnik/vllm-cpu?style=for-the-badge&logo=github&logoColor=white&labelColor=2b3137&color=d73a49" alt="GitHub Issues">
  </a>
  <a href="https://github.com/MekayelAnik/vllm-cpu/pulls">
    <img src="https://img.shields.io/github/issues-pr/MekayelAnik/vllm-cpu?style=for-the-badge&logo=github&logoColor=white&labelColor=2b3137&color=2188ff" alt="GitHub PRs">
  </a>
</p>

<p align="center">
  <a href="https://pypi.org/project/vllm-cpu-avx512vnni/">
    <img src="https://img.shields.io/pypi/v/vllm-cpu-avx512vnni?style=for-the-badge&logo=pypi&logoColor=white&labelColor=2b3137&color=3775a9" alt="PyPI Version">
  </a>
  <a href="https://pypi.org/project/vllm-cpu-avx512vnni/">
    <img src="https://img.shields.io/pypi/dm/vllm-cpu-avx512vnni?style=for-the-badge&logo=pypi&logoColor=white&labelColor=2b3137&color=9c27b0" alt="PyPI Downloads">
  </a>
  <a href="https://github.com/MekayelAnik/vllm-cpu/blob/main/LICENSE">
    <img src="https://img.shields.io/github/license/MekayelAnik/vllm-cpu?style=for-the-badge&logo=gnu&logoColor=white&labelColor=2b3137&color=a32d2a" alt="License">
  </a>
</p>

<p align="center">
  <a href="https://hub.docker.com/r/mekayelanik/vllm-cpu">
    <img src="https://img.shields.io/docker/pulls/mekayelanik/vllm-cpu?style=for-the-badge&logo=docker&logoColor=white&labelColor=2b3137&color=0db7ed" alt="Docker Pulls">
  </a>
  <a href="https://hub.docker.com/r/mekayelanik/vllm-cpu">
    <img src="https://img.shields.io/docker/stars/mekayelanik/vllm-cpu?style=for-the-badge&logo=docker&logoColor=white&labelColor=2b3137&color=f0c14b" alt="Docker Stars">
  </a>
  <a href="https://hub.docker.com/r/mekayelanik/vllm-cpu">
    <img src="https://img.shields.io/docker/v/mekayelanik/vllm-cpu?style=for-the-badge&logo=docker&logoColor=white&labelColor=2b3137&color=6cc644&label=version" alt="Docker Version">
  </a>
  <a href="https://hub.docker.com/r/mekayelanik/vllm-cpu">
    <img src="https://img.shields.io/docker/image-size/mekayelanik/vllm-cpu?style=for-the-badge&logo=docker&logoColor=white&labelColor=2b3137&color=9c27b0" alt="Docker Image Size">
  </a>
</p>

<p align="center">
  <a href="https://github.com/MekayelAnik/vllm-cpu/commits/main">
    <img src="https://img.shields.io/github/last-commit/MekayelAnik/vllm-cpu?style=for-the-badge&logo=git&logoColor=white&labelColor=2b3137&color=ff6f00" alt="Last Commit">
  </a>
  <a href="https://github.com/MekayelAnik/vllm-cpu/graphs/contributors">
    <img src="https://img.shields.io/github/contributors/MekayelAnik/vllm-cpu?style=for-the-badge&logo=github&logoColor=white&labelColor=2b3137&color=00bcd4" alt="Contributors">
  </a>
  <a href="https://github.com/MekayelAnik/vllm-cpu">
    <img src="https://img.shields.io/github/repo-size/MekayelAnik/vllm-cpu?style=for-the-badge&logo=github&logoColor=white&labelColor=2b3137&color=607d8b" alt="Repo Size">
  </a>
</p>

---

<div align="center">

## Buy Me a Coffee

**Your support encourages me to keep creating/supporting my open-source projects.** If you found value in this project, you can buy me a coffee to keep me up all the sleepless nights.

<a href="https://07mekayel07.gumroad.com/coffee" target="_blank">
<img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" width="217" height="60">
</a>

</div>

---

## About

vLLM is a fast and easy-to-use library for LLM inference and serving. <b>This PyPl package has VNNI (AVX512+VNNI) inference built in on supported CPUs.</b>

Originally developed in the [Sky Computing Lab](https://sky.cs.berkeley.edu) at UC Berkeley, vLLM has evolved into a community-driven project with contributions from both academia and industry.

vLLM is fast with:

- State-of-the-art serving throughput
- Efficient management of attention key and value memory with [**PagedAttention**](https://blog.vllm.ai/2023/06/20/vllm.html)
- Continuous batching of incoming requests
- Fast model execution with <b>VNNI on supported CPUs</b> Use this package <b>ONLY IF</b> your CPU have <b>avx512vnni</b> or newer instruction sets
- Quantizations: [GPTQ](https://arxiv.org/abs/2210.17323), [AWQ](https://arxiv.org/abs/2306.00978), [AutoRound](https://arxiv.org/abs/2309.05516), INT4, INT8, and FP8
- Optimized CPU kernels, including integration with FlashAttention and FlashInfer
- Speculative decoding
- Chunked prefill

vLLM is flexible and easy to use with:

- Seamless integration with popular Hugging Face models
- High-throughput serving with various decoding algorithms, including *parallel sampling*, *beam search*, and more
- Tensor, pipeline, data and expert parallelism support for distributed inference
- Streaming outputs
- OpenAI-compatible API server
- Support for x86_64, PowerPC CPUs, Arm CPUs and Applie Scilicon (CPU inference). <b>This package does not support any GPU inference.</b> For GPU inference support use the official [vLLM PypI](https://pypi.org/project/vllm/)
- Prefix caching support
- Multi-LoRA support

vLLM seamlessly supports most popular open-source models on HuggingFace, including:

- Transformer-like LLMs (e.g., Llama)
- Mixture-of-Expert LLMs (e.g., Mixtral, Deepseek-V2 and V3)
- Embedding Models (e.g., E5-Mistral)
- Multi-modal LLMs (e.g., LLaVA)

Find the full list of supported models [here](https://docs.vllm.ai/en/latest/models/supported_models.html).

## Important Notes

- Install this package on Linux envirenment only. For Windows you will have to use WSL2 or later
- This package has a Container.io (Docker/Podman etc.) compatible image in [Docker Hub](https://hub.docker.com/r/mekayelanik/vllm-cpu)
- [Apache Licence of main vLLM project](https://raw.githubusercontent.com/vllm-project/vllm/refs/heads/main/LICENSE)
- [GPL License of this CPU specific vLLM package](https://raw.githubusercontent.com/MekayelAnik/vllm-cpu/refs/heads/main/LICENSE)
- **For versions 0.8.5–0.12.0, use `.post2` releases** (e.g., `pip install vllm-cpu-avx512vnni==0.12.0.post2`) — includes critical CPU platform detection fix

## Platform Detection Fix (versions 0.8.5 - 0.12.0)

If you encounter `RuntimeError: Failed to infer device type` or see `UnspecifiedPlatform` warnings with versions 0.8.5 to 0.12.0, run this one-time fix after installation:

```python
import os, sys, importlib.metadata as m
v = next((d.metadata['Version'] for d in m.distributions() if d.metadata['Name'].startswith('vllm-cpu')), None)
if v:
    p = next((p for p in sys.path if 'site-packages' in p and os.path.isdir(p)), None)
    if p:
        d = os.path.join(p, 'vllm-0.0.0.dist-info'); os.makedirs(d, exist_ok=True)
        open(os.path.join(d, 'METADATA'), 'w').write(f'Metadata-Version: 2.1\nName: vllm\nVersion: {v}+cpu\n')
        print(f'Fixed: vllm version set to {v}+cpu')
```

This creates a package alias so vLLM detects the CPU platform correctly. Only needed once per environment. Versions 0.8.5.post2+ and 0.12.0+ include this fix automatically.

## Getting Started

Install vLLM with a single command:

```bash
pip install vllm-cpu-avx512vnni --index-url https://download.pytorch.org/whl/cpu --extra-index-url https://pypi.org/simple
```

This installs vllm-cpu-avx512vnni with CPU-optimized PyTorch (no CUDA dependencies).

### Alternative: Using uv (faster)

```bash
uv pip install vllm-cpu-avx512vnni --index-url https://download.pytorch.org/whl/cpu --extra-index-url https://pypi.org/simple
```

Install `uv` on Linux:
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```

## vllm-cpu
This CPU specific vLLM has **5 optimized wheel packages** from the upstream vLLM source code:

| Package | Optimizations | Target CPUs |
|---------|--------------|-------------|
| [`vllm-cpu`](https://pypi.org/project/vllm-cpu/) | Baseline (no AVX512) | All x86_64 and ARM64 CPUs |
| [`vllm-cpu-avx512`](https://pypi.org/project/vllm-cpu-avx512/) | AVX512 | Intel Skylake-X and newer |
| [`vllm-cpu-avx512vnni`](https://pypi.org/project/vllm-cpu-avx512vnni/) | AVX512 + VNNI | Intel Cascade Lake and newer |
| [`vllm-cpu-avx512bf16`](https://pypi.org/project/vllm-cpu-avx512bf16/) | AVX512 + VNNI + BF16 | Intel Cooper Lake and newer |
| [`vllm-cpu-amxbf16`](https://pypi.org/project/vllm-cpu-amxbf16/) | AVX512 + VNNI + BF16 + AMX | Intel Sapphire Rapids (4th gen Xeon+) |

Each package is compiled with specific CPU instruction set flags for optimal inference performance.


## Check available CPU instruction sets
```
lscpu | grep -i flags
```
## Example list of CPUs with their supported instruction sets
 | CPU Architecture (Intel/AMD) | AVX2 | AVX-512 F (Base) | VNNI (INT8) | BF16 (BFloat16) (via AVX-512) | AMX-BF16 (via Tile Unit) |
|---------|--------------|:-------------:|:-------------:|:-------------:|:-------------:|
| Intel 4th Gen / AMD Ryzen Zen2 & Newer | Yes | No	| No | No | No |
| Intel Skylake-SP / Skylake-X / AMD Zen 4 & Newer | Yes | Yes	| No	| No	| No |
| Intel Cooper Lake (3rd Gen Xeon) / AMD Zen 4 (EPYC) / Ryzen Zen5 & Newer | Yes | Yes | Yes | Yes | No |
|Intel Sapphire Rapids (4th Gen Xeon) & Newer	| Yes	| Yes	| Yes	| Yes | Yes |

<b>***Currently no AMD CPU support AMXBF16. AMD expected to include AMXBF16 support from AMD Zen 7 CPUs</b>

---

<div align="center">

## Buy Me a Coffee

**Your support encourages me to keep creating/supporting my open-source projects.** If you found value in this project, you can buy me a coffee to keep me up all the sleepless nights.

<a href="https://07mekayel07.gumroad.com/coffee" target="_blank">
<img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" width="217" height="60">
</a>

</div>

---

- [vLLM documentation](https://docs.vllm.ai/en/latest/) to learn more.
- [GitHub Repo for this package](https://github.com/MekayelAnik/vllm-cpu)
- [DockerHub](https://hub.docker.com/r/mekayelanik/vllm-cpu)
- [List of Supported Models](https://docs.vllm.ai/en/latest/models/supported_models.html)
