Metadata-Version: 2.4
Name: inferencebench-llm
Version: 0.0.2
Summary: LLM inference plugin for InferenceBench Suite (vLLM in Phase 1; SGLang/TRT-LLM/llama.cpp/MLX in Phase 2+)
Project-URL: Homepage, https://github.com/yobitelcomm/bench
Author-email: Yobitel Communications <bench@yobitel.com>
License: Apache-2.0
Keywords: ai,benchmark,inference,llm,ml,vllm
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.12
Requires-Dist: inferencebench-envelope
Requires-Dist: inferencebench-harness
Requires-Dist: pydantic~=2.9
Requires-Dist: pyyaml~=6.0
Description-Content-Type: text/markdown

# inferencebench-llm

LLM inference plugin for InferenceBench Suite. Drives requests through any
inference engine and produces a signed envelope with TTFT/TPOT/throughput/
goodput-at-SLO metrics.

## Status

Phase 1 ships **vLLM-only** support. SGLang, TensorRT-LLM, llama.cpp, MLX in Phase 2.

## Install

```bash
pip install inferencebench inferencebench-llm
```

## Quickstart

```bash
# Start a vLLM server (separately) on :8000, then:
bench run llm.inference \
    --model meta-llama/Llama-4-Maverick \
    --engine vllm \
    --endpoint http://localhost:8000/v1 \
    --concurrency 1,4,16,64 \
    --duration 300
```

See `docs/plugins/llm-inference.md` for the full reference.
