Metadata-Version: 2.4
Name: xpyd-bench
Version: 0.1.0
Summary: Benchmarking & PD ratio planning tool for xPyD proxy
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: httpx>=0.27.0
Requires-Dist: aiohttp>=3.9.0
Requires-Dist: numpy>=1.26.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: PyYAML>=6.0
Requires-Dist: rich>=13.0.0
Requires-Dist: starlette>=0.37.0
Requires-Dist: uvicorn>=0.29.0
Provides-Extra: tokenizer
Requires-Dist: tiktoken>=0.5.0; extra == "tokenizer"
Provides-Extra: http2
Requires-Dist: h2>=4.0.0; extra == "http2"
Provides-Extra: dev
Requires-Dist: pytest>=8.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23.0; extra == "dev"
Requires-Dist: ruff>=0.3.0; extra == "dev"
Requires-Dist: isort>=5.13.0; extra == "dev"
Requires-Dist: pre-commit>=3.6.0; extra == "dev"
Requires-Dist: httpx>=0.27.0; extra == "dev"
Requires-Dist: tiktoken>=0.5.0; extra == "dev"
Requires-Dist: h2>=4.0.0; extra == "dev"

📖 **[完整使用指南 → docs/guide.md](docs/guide.md)**

# xPyD-bench

Benchmarking & PD ratio planning tool for [xPyD-proxy](https://github.com/xPyD-hub/xPyD-proxy).

## Features

- **`xpyd-bench`** — Benchmark xPyD proxy with configurable concurrency, request patterns, and both `/v1/completions` and `/v1/chat/completions` endpoints

For PD ratio planning, see [xPyD-plan](https://github.com/xPyD-hub/xPyD-plan).

## Install

```bash
pip install xpyd-bench
```

## Quick Start

### Benchmark

```bash
# Run benchmark against a running xPyD proxy
xpyd-bench --target http://localhost:8080 \
           --endpoint chat \
           --concurrency 16 \
           --num-requests 200 \
           --output results.json

# Use completion endpoint
xpyd-bench --target http://localhost:8080 \
           --endpoint completion \
           --concurrency 8 \
           --num-requests 100
```

## Configuration

See [examples/](examples/) for sample configs and scenarios.

## Output Metrics

- **TTFT** — Time to first token
- **TPS** — Tokens per second (per request & aggregate)
- **Latency** — P50 / P90 / P99 end-to-end latency
- **Throughput** — Total requests/sec and tokens/sec
- **Error rate** — Failed requests count and percentage

## License

TBD
