Metadata-Version: 2.4
Name: mlx-ash-kv
Version: 8.2.5
Summary: Asynchronous Self-Healing KV Cache for Silicon-Native LLMs by GDI Nexus
Author-email: GDI Nexus <contactus@gdinexus.com>
License: Apache-2.0
Project-URL: Homepage, https://gdinexus.com
Project-URL: Repository, https://github.com/iamrealvinnu/mlx-ash-kv
Project-URL: Bug Tracker, https://github.com/iamrealvinnu/mlx-ash-kv/issues
Keywords: mlx,llm,kv-cache,self-healing,apple-silicon,neural-engine,cuda,pytorch,varentropy,attention-masking
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: Microsoft :: Windows
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: mlx>=0.18.0
Requires-Dist: numpy
Requires-Dist: rich
Requires-Dist: coremltools
Requires-Dist: gradio
Requires-Dist: mlx-lm
Requires-Dist: torch
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: build; extra == "dev"
Requires-Dist: twine; extra == "dev"

# ASH-KV: Hardware-Native Neural Integrity Middleware

[![Hardware](https://img.shields.io/badge/Hardware-Apple%20Silicon%20%26%20NVIDIA-blue)](#)
[![License](https://img.shields.io/badge/License-Apache%202.0-green)](#)
[![Version](https://img.shields.io/badge/Version-8.2.5--beta-emerald)](#)
[![Company](https://img.shields.io/badge/Developed%20by-GDI%20Nexus-black)](https://gdinexus.com)

**ASH-KV** (Asynchronous Self-Healing KV Cache) is a high-performance middleware layer designed for **Runtime Neural Integrity Enforcement**. It leverages silicon-native kernels to monitor the mathematical uncertainty of the Attention Manifold and surgically prunes logical drift at the hardware level.

---

## 🔬 Technical Core

### ⚡ Deterministic Manifold Monitoring
Instead of heuristic text-scanning, ASH-KV monitors **Attention Varentropy**. By calculating the mathematical variance across the KV-Cache in real-time, the system identifies the exact moment a model's transition probability distribution collapses—the mathematical precursor to hallucination.

### 🛡️ Fused Kernel Mutation
When drift is detected, ASH-KV executes a **Gaussian Penalty Mask** directly within the model's compute graph.
*   **Apple Silicon**: Uses `@mx.compile` Fused Metal kernels for zero-latency mutation.
*   **NVIDIA**: Uses PyTorch/CUDA-synchronized tensor operations.
*   **Latency**: Measured at **< 0.9ms** on Apple M4 hardware.

---

## 📖 API Reference

### `protect(model, sensitivity=0.85, critic_model_path=None)`
Wraps an existing model with the ASH-KV Hypervisor.

| Parameter | Type | Default | Description |
| :--- | :--- | :--- | :--- |
| `model` | `nn.Module` | Required | An MLX or PyTorch model instance. |
| `sensitivity` | `float` | `0.85` | The drift threshold (0.0 to 1.0). Lower is stricter. |
| `critic_model_path` | `str` | `None` | Optional path to a CoreML `.mlpackage` for ANE offloading. |

**Returns**: `(protected_model, cache, adapter, proxies)`
*   `cache`: The `ASHCache` instance managing the manifold.
*   `adapter`: The `AdaptiveSensitivity` agent for dynamic scaling.
*   `proxies`: A list of KV-cache proxies to be passed to the model's forward pass.

---

## 🚀 Usage with `mlx-lm`

ASH-KV is designed to be a drop-in upgrade for the `mlx-lm` ecosystem.

```python
from mlx_lm import load
from mlx_ash_kv.api import protect, generate_stream

# 1. Load your model natively
model, tokenizer = load("mlx-community/Meta-Llama-3-8B-Instruct-4bit")

# 2. Apply the ASH-KV Shield
model, cache, adapter, proxies = protect(model, sensitivity=0.85)

# 3. Stream with Real-Time Healing
gen = generate_stream(model, tokenizer, cache, proxies, prompt="Explain quantum gravity.")

for token, health_score in gen:
    print(token, end="", flush=True)
    # health_score < 0.1 indicates an ASH-KV intervention occurred
```

---

## 📊 Benchmarks & Reproducibility
Our performance claims are verifiable using the included benchmarking suite.
```bash
ash-kv install    # Hardware Stress Test
ash-kv benchmark  # Run 100-case Latency/Integrity suite
```
Scripts are located in `scripts/publish_benchmarks.py`. Methodology uses `time.perf_counter_ns()` to measure the Fused Metal Kernel overhead.

---

## 🏗️ Architecture (HAL)
The **Hardware Abstraction Layer** ensures the same code runs across disparate architectures:
*   **`MLXHealer`**: Fused Metal operations for Apple Silicon.
*   **`CudaHealer`**: Synchronized PyTorch operations for NVIDIA.
*   **`UniversalTensorCritic`**: Zero-shot mathematical manifold evaluation.

---

### ⚠️ DISCLAIMER
ASH-KV is a probabilistic reliability layer for assisting professionals. It is NOT a substitute for professional clinical or legal judgment. All AI outputs must be verified by qualified humans.

---
© 2026 GDI Nexus Software Solutions LLP. All rights reserved.
