Metadata-Version: 2.4
Name: gpuroutertest
Version: 0.1.0
Summary: Spin up OpenAI-compatible vLLM endpoints on your own AWS GPUs with one command.
Author: Susmit Kulkarni
License: MIT
Project-URL: Homepage, https://github.com/HolboxAI/gpuroutertest
Project-URL: Issues, https://github.com/HolboxAI/gpuroutertest/issues
Keywords: vllm,aws,ec2,llm,inference,gpu,openai
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: boto3>=1.26
Requires-Dist: typer>=0.9
Provides-Extra: dev
Requires-Dist: pytest>=7; extra == "dev"
Requires-Dist: moto[ec2]>=5; extra == "dev"
Requires-Dist: ruff>=0.4; extra == "dev"
Requires-Dist: build>=1; extra == "dev"
Requires-Dist: twine>=5; extra == "dev"
Dynamic: license-file

# gpuroutertest

Spin up **OpenAI-compatible [vLLM](https://github.com/vllm-project/vllm) endpoints on your own AWS GPUs** with a single command. `gpuroutertest` picks a GPU instance that fits the model, boots it from a Deep Learning AMI, downloads the weights, and hands you an endpoint URL + API key. Tear it down just as fast when you're done.

Built for developers who want to rapidly stand up a private inference endpoint for a Hugging Face model without hand-rolling EC2, security groups, and vLLM flags.

## Install

```bash
pip install gpuroutertest
```

You need AWS credentials (`aws configure`, `AWS_PROFILE`, or SSO) with permission to manage EC2, plus GPU (`G`/`P`) instance quota in your region.

## CLI quickstart

```bash
# See what's available
br list-models

# Deploy GLM-4 9B on a single A10G, wait until it's serving
br deploy glm49b --size small --region us-east-1 --profile myprofile

# List running deployments (source of truth = AWS tags, so keys are never lost)
br ps --region us-east-1

# Inspect / fetch endpoint / read boot logs
br status  <api_key|instance-id> --region us-east-1
br endpoint <api_key|instance-id> --region us-east-1
br logs     <api_key|instance-id> --region us-east-1

# Tear everything down (stops billing)
br delete  <api_key|instance-id> --region us-east-1 -y
```

Every command takes `--profile/-p` and `--region/-r`. `deploy` also supports:

| Flag | Purpose |
|------|---------|
| `--size` | `small` / `medium` / `large` tier from the registry |
| `--hf-token` | token for gated HF models (or set `HF_TOKEN`) |
| `--cidr` | restrict who can reach port 8000 (default `0.0.0.0/0`) |
| `--ttl` | auto-terminate after N minutes — a cost guard for dev instances |
| `--timeout` | minutes to wait for the health check (default 30) |
| `--no-wait` | return as soon as the instance is running |

## Library usage

```python
import gpuroutertest as br

dep = br.deploy("glm49b", size="small", region="us-east-1", profile="myprofile")
print(dep.endpoint_url, dep.api_key)

# ... use the OpenAI-compatible API at dep.endpoint_url with dep.api_key ...

br.destroy(dep.api_key, region="us-east-1")
```

Call the endpoint like any OpenAI server:

```bash
curl $ENDPOINT/chat/completions \
  -H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" \
  -d '{"model":"zai-org/glm-4-9b-chat-hf","messages":[{"role":"user","content":"Hello"}]}'
```

## Models

Models live in [`sdk/gpuroutertest/registry.json`](sdk/gpuroutertest/registry.json). Each entry maps a key to a Hugging Face model id and per-size serving config (instance type, disk, vLLM flags). Add a model by adding an entry — no code changes required.

## Notes & limitations

- The security group opens **port 8000**; it defaults to the whole internet and is protected only by the API key. Use `--cidr` in real use.
- Traffic is plain **HTTP** (no TLS). Put it behind a proxy/load balancer for anything beyond dev.
- Instance boot logs are available via `br logs` (EC2 console output). vLLM's own container logs require SSH/SSM, which are intentionally not provisioned.

## License

MIT — see [LICENSE](LICENSE).
