Metadata-Version: 2.1
Name: isagellm-backend
Version: 0.1.1.4
Summary: sageLLM backend provider abstraction and mock implementation
Author: IntelliStream Team
License: Proprietary - IntelliStream
Classifier: Development Status :: 3 - Alpha
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: ==3.11.*
Description-Content-Type: text/markdown
Requires-Dist: isagellm-protocol<0.2.0,>=0.1.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: pytest-mock>=3.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23.0; extra == "dev"
Requires-Dist: ruff>=0.8.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: bandit[toml]>=1.7.0; extra == "dev"
Requires-Dist: pre-commit>=3.0.0; extra == "dev"
Requires-Dist: isagellm-core<0.2.0,>=0.1.0; extra == "dev"
Provides-Extra: embedding
Requires-Dist: sentence-transformers<4.0.0,>=2.2.0; extra == "embedding"
Requires-Dist: torch>=2.0.0; extra == "embedding"

# sagellm-backend

[![CI](https://github.com/intellistream/sagellm-backend/actions/workflows/ci.yml/badge.svg)](https://github.com/intellistream/sagellm-backend/actions/workflows/ci.yml)
[![PyPI version](https://badge.fury.io/py/isagellm-backend.svg)](https://badge.fury.io/py/isagellm-backend)
[![Python Version](https://img.shields.io/pypi/pyversions/isagellm-backend.svg)](https://pypi.org/project/isagellm-backend/)
[![License](https://img.shields.io/badge/License-Proprietary-red.svg)](LICENSE)
[![Code style: ruff](https://img.shields.io/badge/code%20style-ruff-000000.svg)](https://github.com/astral-sh/ruff)

Hardware abstraction layer for sageLLM inference engine, providing unified backend interfaces for CUDA, Ascend, and other accelerators.

## Features

- **Unified Hardware Abstraction**: Single API for multiple hardware backends
- **Mock Backend**: Test without real hardware
- **CUDA Support**: Native CUDA backend implementation
- **HuggingFace Integration**: Pre-configured engine for HF Transformers
- **Capability Matrix**: Hardware capability discovery and validation

## Installation

```bash
pip install isagellm-backend
```

## Quick Start

```bash
git clone git@github.com:intellistream/sagellm-backend.git
cd sagellm-backend
./quickstart.sh

# Run tests
pytest tests/ -v
```

## Usage Examples

### Basic Backend Usage

```python
from sagellm_backend import MockBackendProvider, DType

# Create backend
backend = MockBackendProvider(
    supported_dtypes=[DType.FP16, DType.BF16],
    has_collective=True,
)

# Query capabilities
cap = backend.capability()
print(cap.supported_dtypes)

# Allocate KV block
block = backend.kv_block_alloc(128, DType.FP16)
```

### HuggingFace CUDA Engine

```python
from sagellm_backend.engine.hf_cuda import HFCudaEngine, HFCudaEngineConfig

# Standard configuration
config = HFCudaEngineConfig(
    engine_id="hf-001",
    model_path="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    device="cuda:0",
    dtype="float16",
    device_map="auto",
    load_in_8bit=False,
    load_in_4bit=False,
    trust_remote_code=False,
    max_new_tokens=256,
    max_batch_size=8,
)
engine = HFCudaEngine(config)
await engine.start()

# Run inference
response = await engine.execute(request)
```

For testing without GPU:

```python
# Mock mode
mock_config = HFCudaEngineConfig(
    engine_id="hf-mock",
    model_path="mock-model",
    device="cuda:0",
    dtype="float16",
    device_map="auto",
    load_in_8bit=False,
    load_in_4bit=False,
    trust_remote_code=False,
    max_new_tokens=64,
    max_batch_size=2,
    mock_mode=True,
)
mock_engine = HFCudaEngine(mock_config)
await mock_engine.start()
```

## Extending with New Backends

```python
# Create provider in providers/ directory
class AscendBackendProvider:
    def capability(self) -> CapabilityDescriptor:
        return CapabilityDescriptor(
            supported_dtypes=[DType.FP16, DType.BF16, DType.INT8],
            # ...
        )
    
    # Implement other interface methods...

# Register via entry point in pyproject.toml
[project.entry-points."sagellm.backends"]
ascend_cann = "sagellm_backend.providers.ascend:create_ascend_backend"
```

## Documentation

- [Architecture](docs/ARCHITECTURE.md)
- [Contributing](CONTRIBUTING.md)
- [Team](docs/TEAM.md)

## License

Proprietary
