Metadata-Version: 2.1
Name: isagellm
Version: 0.1.0.4
Summary: sageLLM: Modular LLM inference engine for domestic computing power (Huawei Ascend, NVIDIA)
Author: IntelliStream Team
License: Proprietary - IntelliStream
Project-URL: Homepage, https://github.com/IntelliStream/sagellm
Project-URL: Documentation, https://github.com/IntelliStream/sagellm#readme
Project-URL: Repository, https://github.com/IntelliStream/sagellm
Keywords: llm,inference,ascend,huawei,npu,cuda,domestic
Classifier: Development Status :: 3 - Alpha
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: ==3.11.*
Description-Content-Type: text/markdown
Requires-Dist: isagellm-protocol<0.2.0,>=0.1.0
Requires-Dist: isagellm-backend<0.2.0,>=0.1.1.2
Requires-Dist: isagellm-core<0.2.0,>=0.1.0
Requires-Dist: isagellm-control-plane<0.2.0,>=0.1.0.2
Requires-Dist: isagellm-gateway<0.2.0,>=0.1.0.2
Requires-Dist: isagellm-kv-cache<0.2.0,>=0.1.0
Requires-Dist: isagellm-comm<0.2.0,>=0.1.0
Requires-Dist: isagellm-compression<0.2.0,>=0.1.0
Requires-Dist: click>=8.0.0
Requires-Dist: rich>=13.0.0
Requires-Dist: pyyaml>=6.0.0
Provides-Extra: all
Requires-Dist: isagellm[benchmark,cuda]; extra == "all"
Provides-Extra: ascend
Requires-Dist: torch>=2.0.0; extra == "ascend"
Provides-Extra: benchmark
Requires-Dist: isagellm-benchmark>=0.1.0; extra == "benchmark"
Provides-Extra: cuda
Requires-Dist: torch>=2.0.0; extra == "cuda"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: ruff>=0.8.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"

# sageLLM

<p align="center">
  <strong>🚀 Modular LLM Inference Engine for Domestic Computing Power</strong>
</p>

<p align="center">
  Ollama-like experience for Chinese hardware ecosystems (Huawei Ascend, NVIDIA)
</p>

---

## ✨ Features

- 🎯 **One-Click Install** - `pip install isagellm` gets you started immediately
- 🔌 **Mock-First** - Test without GPU, perfect for CI/CD
- 🇨🇳 **Domestic Hardware** - First-class support for Huawei Ascend NPU
- 📊 **Observable** - Built-in metrics (TTFT, TBT, throughput, KV usage)
- 🧩 **Plugin System** - Extend with custom backends and engines

## 📦 Quick Install

```bash
# Install sageLLM (includes mock backend, no GPU required)
pip install isagellm

# With Control Plane (request routing & scheduling)
pip install 'isagellm[control-plane]'

# With API Gateway (OpenAI-compatible REST API)
pip install 'isagellm[gateway]'

# Full server (Control Plane + Gateway)
pip install 'isagellm[server]'

# With CUDA support
pip install 'isagellm[cuda]'

# All features
pip install 'isagellm[all]'
```

## 🚀 Quick Start

### CLI (like ollama)

```bash
# Show system info
sage-llm info

# Start mock server (no GPU required)
sage-llm serve --mock

# Single inference
sage-llm run -p "What is LLM inference?" --mock

# Run Year1 demo validation
sage-llm demo --workload year1 --mock

# Start OpenAI-compatible API gateway
sage-llm gateway --mock --port 8080
```

### Python API

```python
from sagellm import Request, MockEngine

# Create mock engine (no GPU needed)
engine = MockEngine()

# Run inference
request = Request(
    request_id="demo-001",
    prompt="Hello, world!",
    max_tokens=128,
)
response = engine.generate(request)

print(f"Response: {response.text}")
print(f"TTFT: {response.metrics.ttft_ms:.2f} ms")
print(f"Throughput: {response.metrics.throughput_tps:.2f} tokens/s")
```

### Configuration

```yaml
# ~/.sage-llm/config.yaml
backend:
  kind: mock  # or: cuda, ascend

engine:
  kind: mock
  model: Qwen/Qwen2-7B

workload:
  segments:
    - short   # 128 in → 128 out
    - long    # 2048 in → 512 out
    - stress  # concurrent requests
```

## 📊 Year 1 Demo Contract

sageLLM must produce these metrics for validation:

```json
{
  "ttft_ms": 45.2,
  "tbt_ms": 12.5,
  "throughput_tps": 80.0,
  "peak_mem_mb": 24576,
  "kv_used_tokens": 4096,
  "prefix_hit_rate": 0.85,
  "evict_count": 3
}
```

Run validation:
```bash
sage-llm demo --workload year1 --output metrics.json
```

## 🏗️ Architecture

```
isagellm (umbrella package)
├── isagellm-protocol       # Protocol v0.1 types
│   └── Request, Response, Metrics, Error, StreamEvent
├── isagellm-core           # Runtime & Demo Runner
│   └── Config, Engine, Factory, DemoRunner
├── isagellm-backend        # Hardware abstraction
│   └── BackendProvider, MockBackend, (CUDABackend, AscendBackend)
├── isagellm-control-plane  # Request routing & scheduling (optional)
│   └── ControlPlaneManager, Router, Policies, Lifecycle
└── isagellm-gateway        # OpenAI-compatible REST API (optional)
    └── FastAPI server, /v1/chat/completions, Session management
```

## 🔧 Development

### Quick Setup (Development Mode)

```bash
# Clone all repositories
./scripts/clone-all-repos.sh

# Install all packages in editable mode
./quickstart.sh

# Open all repos in VS Code Multi-root Workspace
code sagellm.code-workspace
```

**📖 See [WORKSPACE_GUIDE.md](WORKSPACE_GUIDE.md) for Multi-root Workspace usage.**

### Testing

```bash
# Clone and setup
git clone https://github.com/IntelliStream/sagellm.git
cd sagellm
pip install -e ".[dev]"

# Run tests
pytest -v

# Format & lint
ruff format .
ruff check . --fix

# Type check
mypy src/sagellm/

# Verify dependency hierarchy
python scripts/verify_dependencies.py
```

### 📖 Development Resources

- [**DEVELOPER_GUIDE.md**](DEVELOPER_GUIDE.md) - 架构规范与开发指南
- [**PR_CHECKLIST.md**](PR_CHECKLIST.md) - Pull Request 审查清单
- [**scripts/verify_dependencies.py**](scripts/verify_dependencies.py) - 依赖层次验证

## 📚 Package Details

| Package | PyPI Name | Import Name | Description |
|---------|-----------|-------------|-------------|
| sagellm | `isagellm` | `sagellm` | Umbrella package (install this) |
| sagellm-protocol | `isagellm-protocol` | `sagellm_protocol` | Protocol v0.1 types |
| sagellm-core | `isagellm-core` | `sagellm_core` | Runtime & config |
| sagellm-backend | `isagellm-backend` | `sagellm_backend` | Hardware abstraction |

## 🎯 Roadmap

- **Year 1**: Core inference with KV cache, prefix sharing, basic eviction
- **Year 2**: Multi-node inference, advanced scheduling
- **Year 3**: Full production-ready deployment

## 📄 License

Proprietary - IntelliStream. Internal use only.

---

<p align="center">
  <sub>Built with ❤️ by IntelliStream Team for domestic AI infrastructure</sub>
</p>
