Metadata-Version: 2.1
Name: isagellm-control-plane
Version: 0.1.0.1
Summary: sageLLM Control Plane - Intelligent request routing, scheduling, and engine lifecycle management
Author: IntelliStream Team
License: Proprietary - IntelliStream
Project-URL: Homepage, https://github.com/IntelliStream/sagellm-control-plane
Project-URL: Repository, https://github.com/IntelliStream/sagellm-control-plane
Keywords: llm,inference,control-plane,scheduling,routing,autoscaling
Classifier: Development Status :: 3 - Alpha
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: ==3.11.*
Description-Content-Type: text/markdown
Requires-Dist: isagellm-protocol<0.2.0,>=0.1.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: httpx>=0.24.0
Provides-Extra: all
Requires-Dist: isagellm-control-plane[gpu,metrics]; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: ruff>=0.8.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Provides-Extra: gpu
Requires-Dist: pynvml>=11.0.0; extra == "gpu"
Provides-Extra: metrics
Requires-Dist: prometheus-client>=0.17.0; extra == "metrics"

# sageLLM Control Plane

**Intelligent request routing, scheduling, and engine lifecycle management for sageLLM.**

## Features

- 🎯 **Scheduling Policies** - FIFO, Priority, SLO-aware, Cost-optimized, Adaptive
- ⚖️ **Load Balancing** - Intelligent request routing across multiple engine instances
- 📈 **Autoscaling** - SLA-based autoscaling for Prefill/Decode instances
- 🔄 **Engine Lifecycle** - Spawn, stop, health check, auto-restart
- 📊 **Observability** - Metrics collection, performance monitoring
- 🧩 **Parallelism** - TP, PP, DP, EP strategy optimization

## Installation

```bash
# 从 PyPI 安装
pip install isagellm-control-plane

# With GPU monitoring
pip install isagellm-control-plane[gpu]

# With Prometheus metrics
pip install isagellm-control-plane[metrics]
```

## 🚀 开发者快速开始

```bash
git clone git@github.com:intellistream/sagellm-control-plane.git
cd sagellm-control-plane
./quickstart.sh   # 一键安装开发环境（含依赖）

# 或手动安装
pip install -e ".[dev]"
```

运行测试：
```bash
pytest tests/ -v
```

## Quick Start

```python
from sagellm_control import ControlPlaneManager

# Create manager with mock mode (no GPU required)
manager = ControlPlaneManager(
    scheduling_policy="adaptive",
    routing_strategy="load_balanced",
    mode="local",  # Use local async executor
)

# Register a mock engine
manager.register_engine(
    engine_id="engine-001",
    model_id="mock-model",
    host="localhost",
    port=8000,
)

# Schedule a request
decision = await manager.schedule_request(
    request_id="req-001",
    prompt="Hello, world!",
    max_tokens=128,
)

print(f"Scheduled to: {decision.instance_id}")
```

### 执行层 API (Task0.8)

Control Plane 提供完整的推理执行接口：

```python
from sagellm_control import ControlPlaneManager, MockControlPlane
from sagellm_protocol import Request

# 使用 Mock 模式（无 GPU 依赖）
cp = MockControlPlane()
cp.register_engine("engine-001", model_id="test-model", host="localhost", port=8000)

# 1. 非流式推理
request = Request(
    request_id="req-001",
    trace_id="trace-001",
    model="test-model",
    prompt="Hello, how are you?",
    max_tokens=100,
    stream=False,
)
response = await cp.execute_request(request)
print(f"Output: {response.output_text}")
print(f"TTFT: {response.metrics.ttft_ms:.2f} ms")

# 2. 流式推理
async for event in cp.stream_request(request):
    if event.event == "delta":
        print(event.chunk, end="", flush=True)

# 3. 文本嵌入
embeddings = await cp.get_embeddings(
    texts=["Text 1", "Text 2", "Text 3"],
    model_id="embedding-model"
)
print(f"Generated {len(embeddings)} embeddings of dimension {len(embeddings[0])}")
```

更多示例请参考 `examples/execution_layer_demo.py`。

## Architecture

```
sagellm_control/
├── types.py           # Core data types (RequestMetadata, EngineInfo, etc.)
├── strategies/        # Scheduling policies (FIFO, Priority, SLO, etc.)
├── executors/         # Execution coordinators (HTTP, LocalAsync, Mock)
├── router.py          # Request routing and load balancing
├── autoscaler.py      # SLA-based autoscaling
├── parallelism.py     # Parallelism strategy optimization
├── manager.py         # Main ControlPlaneManager
└── engine_lifecycle.py # Engine lifecycle management
```

## Mock-First Development

All modules support mock mode for testing without GPU:

```python
from sagellm_control.executors import MockExecutionCoordinator

# Use mock executor for CI/CD
executor = MockExecutionCoordinator()
result = await executor.execute(request)
```

## License

Proprietary - IntelliStream
