Metadata-Version: 2.4
Name: open-vllm-sdk
Version: 1.0.0
Summary: Enterprise-grade resilient vLLM client network and preflight validation engine.
Author-email: Aravindh Annadurai <aravindhvignesh58@gmail.com>
License: MIT
Keywords: vllm,resilience,connection-pooling,platform-engineering,llmops
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: fastapi>=0.136.3
Requires-Dist: httpx>=0.28.1
Requires-Dist: pytest>=9.0.3
Requires-Dist: python-dotenv>=1.2.2
Dynamic: license-file

# vllm-sdk

A high-performance, asynchronous resilience gateway client built to connect distributed application services to remote GPU infrastructure safely and efficiently.

---

## Table of Contents
1. [Overview](#-overview)
2. [Key Architecture Benefits](#-key-architecture-benefits)
3. [Installation](#-installation)
4. [Environment Configuration](#-environment-configuration)
5. [Quick Start Usage](#-quick-start-usage)
6. [Console Logging Aesthetics](#-console-logging-aesthetics)
7. [License](#-license)

---

##Overview

Managing raw HTTP streaming routes directly to high-throughput LLM clusters can cause major stability issues, such as socket exhaustion, memory crashes, or lost responses. 

The **vllm-sdk** wraps all this complex networking inside a clean, production-hardened interface. It handles background network management, automatically cleans up messy raw Server-Sent Event (SSE) blocks, and feeds your applications crisp, ready-to-use text tokens in real-time.


## Key Architecture Benefits
* **Asynchronous Concurrency:** Built natively on Python's `asyncio` loop. It easily supports 100+ concurrent app instances (like Chatbots, BOM Parsers, and Tender Text Extractors) without stalling performance.
* **Keep-Alive Connection Pooling:** Reuses active TCP paths over `httpx.AsyncClient` instead of spinning up new sockets for every line, cutting down **Time-To-First-Token (TTFT)**.
* **Pre-flight Integrity Checking:** Instantly scans system paths and environment flags before booting to prevent downstream configuration crashes.
* **Localized Brand Logging:** Implements highly scannable terminal tracking designed after modern web frameworks like FastAPI and Uvicorn.


## Installation

This project is fully managed using the lightning-fast `uv` Python package manager.

```bash
# Clone the repository
git clone [inprogress](inprogress)
cd vllm_sdk

# Sync dependencies and create the virtual environment automatically
uv sync
```

## Quick Start Usage
```bash
# -*- coding: utf-8 -*-
import asyncio
from src.vllm_resilience_sdk.logging_config import setup_sdk_logging
from src.vllm_resilience_sdk import SystemInitializationEngine
from src.vllm_resilience_sdk.clients import ProductionVLLMClient

async def main():
    # 1. Initialize FAANG-style high-visibility console reporting
    setup_sdk_logging()
    
    # 2. Run background system verification checks
    verifier = SystemInitializationEngine(target_log_dir="./logs")
    verifier.run_pre_boot_pipeline()
    
    # 3. Instantiate the connection pooling client
    client = ProductionVLLMClient()
    await client.initialize_vllm_connection()
    
    # 4. Construct workload payload
    sample_payload = {
        "model": "LocalModel",
        "messages": [{"role": "user", "content": "Analyze PCB raw schematics metadata."}]
    }
    
    print("\n--- AI Engine Stream Output Response ---")
    
    # 5. Stream processed text tokens seamlessly
    async for token in client.send_inference_request(sample_payload):
        print(token, end="", flush=True)
        
    print("\n----------------------------------------\n")
    
    # 6. Safely flush socket channels on teardown
    await client.close_vllm_connection()

if __name__ == "__main__":
    asyncio.run(main()) 
```
