Metadata-Version: 2.4
Name: xorfice
Version: 0.1.11
Summary: SOTA Multimodal Inference Engine (S2S, I2I, V2V) for Xoron-Dev.
Author-email: Xoron-Dev <contact@xoron.dev>
Project-URL: Homepage, https://github.com/xoron-dev/xorfice
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: torch>=2.0.0
Requires-Dist: triton
Requires-Dist: transformers
Requires-Dist: fastapi
Requires-Dist: uvicorn
Requires-Dist: pydantic
Requires-Dist: safetensors
Requires-Dist: hf_transfer

# 🚀 Xoron-Dev: Unified Multimodal AI Model

<div align="center">

![Xoron-Dev Logo](https://img.shields.io/badge/Xoron--Dev-MultiMoE-blue?style=for-the-badge&logo=pytorch)
![Version](https://img.shields.io/badge/Version-2.2-purple?style=for-the-badge)
![License](https://img.shields.io/badge/License-MIT-green?style=for-the-badge)
![Python](https://img.shields.io/badge/Python-3.10+-yellow?style=for-the-badge&logo=python)
![PyTorch](https://img.shields.io/badge/PyTorch-2.0+-red?style=for-the-badge&logo=pytorch)

**A state-of-the-art multimodal MoE model that unifies text, image, video, and audio understanding and generation.**

[Architecture](#-architecture-overview) | [Features](#-features) | [Installation](#-installation) | [Usage](#-usage) | [Training](#-training) | [Documentation](./docs/README.md)

</div>

---

## 🏗️ Architecture Overview

Xoron-Dev is built on a modular, mixture-of-experts architecture designed for maximum flexibility and performance.

### 🧠 LLM Backbone (Mixture of Experts)
- **12 Layers, 1024d, 16 Heads** - Optimized for efficient inference and training.
- **Aux-Lossless MoE** - 8 experts with top-2 routing and configurable shared expert isolation.
- **Ring Attention** - Memory-efficient processing for up to **128K context**.
- **Qwen2.5 Tokenizer** - High-density 151K vocabulary for multilingual and code support.

### 👁️ Vision & Video
- **SigLIP-2 Encoder** - 384px native resolution with multi-scale support (128-512px).
- **TiTok 1D Tokenization** - Compressed visual representation (256 tokens) for faster processing.
- **VidTok 3D VAE** - Efficient spatiotemporal video encoding with 4x8x8 compression.
- **3D-RoPE & Temporal MoE** - Sophisticated motion pattern recognition and spatial awareness.

### 🎤 Audio System
- **Raw Waveform Processing** - Direct 16kHz audio input/output (no Mel spectrograms required).
- **Conformer + RMLA** - Advanced speech-to-text with KV compression.
- **BigVGAN Waveform Decoder** - High-fidelity direct waveform generation with Snake activation.
- **Zero-Shot Voice Cloning** - Clone voices from short reference clips using speaker embeddings.

---

## 🌟 Features

### **Multimodal Capabilities**
| Modality | Input | Output | Strategy |
|----------|-------|--------|----------|
| **Text** | 128K Context | Reasoning, Code, Agentic | MoE LLM |
| **Image**| 128-512px | Understanding & SFT | SigLIP + TiTok |
| **Video**| 8-24 Frames | Understanding | VidTok + 3D-RoPE |
| **Audio**| 16kHz Waveform | ASR & TTS | Conformer + BigVGAN |

### **Agentic & Tool Calling**
- **250+ Special Tokens** for structured agent behaviors.
- **Native Tool Use**: Execute shell commands, Python scripts, and Jupyter notebooks.
- **Reasoning**: Advanced Chain-of-Thought (`<|think|>`, `<|plan|>`) for complex tasks.
- **Safety**: Anti-hallucination tokens (`<|uncertain|>`, `<|cite|>`) and confidence scores.

### **Optimization**
- **LoRA Variants**: LoRA+, rsLoRA, and DoRA (r=32, α=64).
- **Lookahead Optimizer**: Enhanced stability and faster convergence.
- **8-bit Optimization**: Save up to 75% optimizer memory with bitsandbytes.
- **Continuous-Scale Training**: Adaptive resolution sampling for optimal VRAM usage.

---

## 🚀 Installation

```bash
# Clone the repository
git clone https://github.com/nigfuapp-web/Xoron-Dev.git
cd Xoron-Dev

# Install dependencies
pip install -r requirements.txt
```

---

## 💻 Usage

### Quick Start (Inference)
```python
from load import load_xoron_model

# Load model and tokenizer
model, tokenizer, device, config = load_xoron_model("Backup-bdg/Xoron-Dev-MultiMoe")

# Generate response
output = model.generate_text("Explain quantum entanglement.", tokenizer)
print(output)
```

### CLI Training
The `build.py` script provides a powerful interface for training and building models.

```bash
# Build a new model from scratch
python build.py --build

# Targeted Fine-tuning
python build.py --hf --text --math        # Fine-tune on Math
python build.py --hf --text --agent       # Fine-tune on Agentic tasks
python build.py --hf --video              # Fine-tune on Video datasets
python build.py --hf --voice              # Fine-tune on Audio/Voice
```

### Granular Text Training Flags
| Flag | Description |
|------|-------------|
| `--math` | Focus on mathematical reasoning and steps. |
| `--agent` | Tool use, code execution, and system operations. |
| `--software` | High-quality software engineering and coding. |
| `--cot` | Chain-of-Thought and logical reasoning. |
| `--medical` | Medical knowledge and clinical reasoning. |
| `--hallucination` | Anti-hallucination and truthfulness. |

---

## 🏋️ Training

### Weighted Loss Strategy
The trainer applies specialized weights to ensure high performance on critical tokens:
- **Reasoning (CoT)**: 1.5x
- **Tool Calling**: 1.3x
- **Anti-Hallucination**: 1.2x

### Continuous-Scale Strategy
Xoron-Dev dynamically samples resolutions during training:
- **Image**: 128px to 384px (step=32)
- **Video**: 8 to 24 frames, 128px to 320px

---

## 📦 Export & Quantization
Export your models for efficient deployment:
```bash
# Export to GGUF (for llama.cpp)
python build.py --hf --gguf --gguf-quant q4_k_m

# Export to ONNX
python build.py --hf --onnx --quant-bits 4
```

---

## 🤝 Contributing
Contributions are welcome! If you have ideas for new modalities or optimizations, please open an issue or PR.

---

## 📄 License
This project is licensed under the MIT License.

---
<div align="center">
Built with ❤️ by the Xoron-Dev Team
</div>
