Metadata-Version: 2.4
Name: xpyd
Version: 1.0.0
Summary: Lightweight Prefill-Decode proxy for disaggregated LLM serving
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: fastapi>=0.110.0
Requires-Dist: uvicorn>=0.29.0
Requires-Dist: uvloop>=0.19.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: aiohttp>=3.9.0
Requires-Dist: requests>=2.31.0
Requires-Dist: colorlog>=6.8.0
Requires-Dist: transformers>=4.38.0
Requires-Dist: prometheus-client>=0.20.0
Requires-Dist: PyYAML>=6.0

# MicroPDProxy

MicroPDProxyServer – a lightweight PD (Prefill-Decode) proxy implementation.

This project provides **dummy prefill and decode nodes** for local development
and debugging of a PD-separated proxy without any GPU or model dependencies.

The dummy nodes expose the minimum compatibility surface required by the
validated proxy implementation under `core/`, including:

- `/v1/models`
- `/v1/completions`
- `/v1/chat/completions`
- `/health`
- `/ping`
- `/metrics` (Prometheus format)

## Architecture

MicroPDProxy implements a **Prefill-Decode (PD) separated** serving
architecture. Incoming requests are routed through two phases:

1. **Prefill** — sent to a prefill node for KV cache preparation (`max_tokens=1`, `stream=False`)
2. **Decode** — forwarded to a decode node for autoregressive token generation

The proxy handles scheduling (Round Robin or Load Balanced), health monitoring,
and dynamic instance management. See [`docs/architecture.md`](docs/architecture.md)
for the full architecture overview.

## Quick Start

```bash
# Install as a CLI tool
pip install .

# Or install in dev mode
pip install -e .

# Start with a YAML config
xpyd --config examples/proxy.yaml

# Or use the traditional way
pip install -r requirements.txt
python core/MicroPDProxyServer.py --config examples/proxy.yaml
```

## Installation

```bash
# Install the xpyd CLI
pip install .

# Verify
xpyd --version
xpyd --help

# Validate a config without starting the server
xpyd --validate-config examples/proxy.yaml
```

## Usage

### Option 1: YAML Configuration (recommended)

Create a YAML config file (see [`examples/proxy.yaml`](examples/proxy.yaml)):

```yaml
model: /path/to/model
port: 8868

prefill:
  nodes:
    - "10.0.0.1:8100"
    - "10.0.0.2:8100"
  tp_size: 8
  dp_size: 2
  world_size_per_node: 8

decode:
  nodes:
    - "10.0.0.3:8200"
    - "10.0.0.4:8200"
  tp_size: 1
  dp_size: 16
  world_size_per_node: 8

scheduling: loadbalanced
```

Start the proxy:

```bash
xpyd --config proxy.yaml
# or
python core/MicroPDProxyServer.py --config proxy.yaml
```

The proxy also searches for config in this order:
1. `--config` / `-c` CLI argument
2. `XPYD_CONFIG` environment variable
3. `./xpyd.yaml` in the current directory

### Startup Node Discovery

The proxy starts listening immediately but returns **503** on business
endpoints (`/v1/completions`, `/v1/chat/completions`) until at least
1 prefill + 1 decode node respond healthy. Health/status/metrics
endpoints are always available.

Configure in YAML:

```yaml
startup:
  wait_timeout_seconds: 600   # exit if nodes not ready after 10 min
  probe_interval_seconds: 10  # probe /health every 10s
```

The topology parameters expand into instance addresses automatically:
- **Prefill**: 2 nodes × (8 / 8) = 1 instance/node = 2 instances
- **Decode**: 2 nodes × (8 / 1) = 8 instances/node = 16 instances

A simple flat-list format is also supported (see [`examples/proxy-simple.yaml`](examples/proxy-simple.yaml)):

```yaml
model: /path/to/model
prefill:
  - "10.0.0.1:8100"
decode:
  - "10.0.0.2:8200"
  - "10.0.0.3:8200"
```

### Option 2: CLI Arguments

```bash
python core/MicroPDProxyServer.py \
  --model /path/to/model \
  --prefill 10.0.0.1:8100 10.0.0.2:8100 \
  --decode 10.0.0.3:8200 10.0.0.4:8200 \
  --port 8868 \
  --roundrobin
```

### Option 3: Parameterized Shell Script

For topology-driven deployments with TP/DP parameters:

```bash
bash core/xpyd_start_proxy.sh \
  --model /path/to/model \
  --prefill-nodes 2 --prefill-tp-size 8 --prefill-dp-size 2 --prefill-world-size-per-node 8 \
  --decode-nodes 2 --decode-tp-size 1 --decode-dp-size 16 --decode-world-size-per-node 8 \
  --prefill-base-port 8100 --decode-base-port 8200
```

### CLI Arguments Reference

| Argument | Short | Default | Description |
|---|---|---|---|
| `--config` | `-c` | — | Path to YAML configuration file |
| `--model` | `-m` | — | Model name / path (required unless in YAML) |
| `--prefill` | `-p` | — | Prefill node URLs (host:port) |
| `--decode` | `-d` | — | Decode node URLs (host:port) |
| `--port` | — | 8000 | Proxy listen port |
| `--roundrobin` | — | false | Use round-robin scheduling |
| `--generator_on_p_node` | — | false | Generate first token on prefill node |

When both `--config` and CLI arguments are provided, CLI arguments take precedence.

### YAML Config Fields

| Field | Type | Default | Description |
|---|---|---|---|
| `model` | string | — | Model name / path (required) |
| `port` | int | 8000 | Proxy listen port |
| `log_level` | string | warning | Log level: debug, info, warning, error |
| `prefill` | list or topology | [] | Prefill node config |
| `decode` | list or topology | — | Decode node config (required) |
| `scheduling` | string | loadbalanced | Scheduling policy: roundrobin, loadbalanced |
| `generator_on_p_node` | bool | false | Generate first token on prefill node |
| `admin_api_key` | string | — | Admin API key (env `ADMIN_API_KEY` overrides) |
| `openai_api_key` | string | — | OpenAI API key (env `OPENAI_API_KEY` overrides) |

## Docker Deployment

```bash
# Build and run the full local topology (2 prefill + 2 decode + proxy)
docker compose up --build

# Or run just the proxy against existing GPU nodes
docker build -t microxpyd .
docker run -p 8868:8868 microxpyd \
  python3 core/MicroPDProxyServer.py \
  --model tokenizers/DeepSeek-R1 \
  --prefill 10.0.0.1:8100 --decode 10.0.0.3:8200 \
  --port 8868
```

See [`docs/deployment.md`](docs/deployment.md) for production deployment details.

## Benchmark

Use vLLM's benchmark tool to test proxy throughput:

```bash
python -m vllm bench serve \
  --base-url http://localhost:8868 \
  --model DeepSeek-R1 \
  --dataset-name sonnet \
  --sonnet-input-len 1024 \
  --sonnet-output-len 128 \
  --num-prompts 100 \
  --request-rate 10
```

## Configuration

| Environment Variable | Default | Description |
|---|---|---|
| `PREFILL_DELAY_PER_TOKEN` | `0.001` | Simulated per-prompt-token prefill latency (seconds) |
| `DECODE_DELAY_PER_TOKEN` | `0.01` | Simulated per-decode-token generation latency (seconds) |
| `ADMIN_API_KEY` | — | API key for admin endpoints (overrides YAML) |
| `OPENAI_API_KEY` | — | Bearer token for backend nodes (overrides YAML) |

## Running Tests

```bash
pip install -r requirements.txt

# Run the full test suite
PYTHONPATH=core:tests python -m pytest tests/ -v

# Run specific test groups
PYTHONPATH=core:tests python -m pytest tests/test_prefill_node.py tests/test_decode_node.py -v  # Node tests
PYTHONPATH=core:tests python -m pytest tests/test_proxy_matrix.py -v                            # Topology matrix
PYTHONPATH=core:tests python -m pytest tests/test_yaml_integration.py -v                        # YAML config integration
PYTHONPATH=core:tests python -m pytest tests/test_config.py tests/test_yaml_config.py -v        # Config validation
PYTHONPATH=core:tests python -m pytest tests/test_topology.py -v                                # Topology expansion
PYTHONPATH=core:tests python -m pytest tests/test_scheduler.py -v                               # Scheduler unit tests
PYTHONPATH=core:tests python -m pytest tests/test_metrics.py -v                                 # Prometheus metrics
```

## Documentation

| Document | Description |
|---|---|
| [Architecture](docs/architecture.md) | System architecture overview |
| [API Reference](docs/api_reference.md) | HTTP API endpoints |
| [Configuration](docs/configuration.md) | YAML config file reference |
| [CLI](docs/cli.md) | xpyd command-line tool (planned) |
| [Scheduling](docs/scheduling.md) | Load balancing strategies |
| [Resilience](docs/resilience.md) | Health checks, circuit breakers, retry (planned) |
| [Metrics](docs/metrics.md) | Prometheus metrics endpoint |
| [Deployment](docs/deployment.md) | Deployment and Docker guide |
| [Quick Start](docs/terminal_by_terminal_quickstart.md) | Terminal-by-terminal setup |
| [One-Click Setup](docs/one_click_dummy_proxy_setup.md) | Quick dummy environment |
| [Proxy Script](docs/xpyd_start_proxy_usage.md) | xpyd_start_proxy.sh usage |
| [Contributing](CONTRIBUTING.md) | Contribution guidelines |
