Metadata-Version: 2.4
Name: thestage-elastic-models-cli
Version: 0.0.19
Summary: Elastic Models Client from TheStage AI
Author-email: TheStage AI team <hello@thestage.ai>
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: qlip_serve_client==0.1.17
Requires-Dist: locust
Requires-Dist: locust-plugins
Requires-Dist: Pillow
Requires-Dist: requests
Requires-Dist: wheel
Requires-Dist: numpy<2,>=1.23.5
Requires-Dist: aiohttp
Dynamic: license-file

# Elastic Models CLI

**CLI tool for benchmarking and testing elastic AI model inference.**

## Installation

```bash
pip install thestage-elastic-models-cli
```

## Commands

### Client Inference
```bash
# Test single inference requests
elastic-models-client client llm --prompt "Hello" --url <endpoint> --model <name>
elastic-models-client client diffusion --prompt "A cat" --url <endpoint> --model <name>
elastic-models-client client vlm --prompt "Describe image" --image <path> --url <endpoint> --model <name>
elastic-models-client client stt --audio <path> --url <endpoint> --model <name>
```


### Benchmarking
```bash
# Run load tests using Locust
elastic-models-client benchmark llm --url <endpoint> --model <name>
elastic-models-client benchmark diffusion --url <endpoint> --model <name>
elastic-models-client benchmark vlm --url <endpoint> --model <name>
elastic-models-client benchmark stt --url <endpoint> --model <name>

# Options: --concurrency, --num-requests, --output-dir, --authorization
```

## Requirements

- **Python**: >=3.10
- **Dependencies**: qlip_serve_client, locust, Pillow, requests, aiohttp
- **NumPy**: <2.0 (Triton compatibility)

## ⚠️ Important Caveats

1. **Triton Dependency**: NumPy must be <2.0 for Triton server compatibility
2. **Authorization**: Use `--authorization` for authenticated endpoints
3. **Ready Endpoint**: Server readiness checks may fail with Nginx/Salad setups
4. **Metadata**: Requires model metadata JSON file or auto-download from server

## Quick Example

```bash
# Test single inference
elastic-models-client client llm \
  --prompt "Write a haiku about AI" \
  --url https://api.example.com/v2/models \
  --model meta-llama/Llama-3.1-8B

# Benchmark LLM with 4 concurrent users, 100 requests
elastic-models-client benchmark llm \
  --url https://api.example.com/v2/models \
  --model meta-llama/Llama-3.1-8B \
  --concurrency 4 \
  --num-requests 100 \
  --output-dir ./results
```

## Output

- **Benchmarks**: CSV stats, HTML reports, JSONL logs (optional)
- **Client**: JSON response with inference results
