Metadata-Version: 2.4
Name: compressgpt-core
Version: 0.1.0
Summary: LLM Compression and Optimization Library - Build the smallest runnable models that preserve target accuracy
Author: chandan678
License: MIT
Project-URL: Homepage, https://github.com/chandan678/compressgpt
Project-URL: Repository, https://github.com/chandan678/compressgpt
Project-URL: Issues, https://github.com/chandan678/compressgpt/issues
Keywords: llm,compression,optimization,fine-tuning,machine-learning,deep-learning,transformers,sft,supervised-fine-tuning
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=1.5.0
Requires-Dist: datasets>=2.0.0
Requires-Dist: transformers>=4.40.0
Requires-Dist: scikit-learn>=1.0.0
Requires-Dist: torch>=2.0.0
Requires-Dist: tqdm>=4.60.0
Requires-Dist: numpy>=1.20.0
Requires-Dist: peft>=0.10.0
Requires-Dist: trl==0.19.0
Requires-Dist: accelerate>=0.20.0
Requires-Dist: bitsandbytes>=0.41.0
Requires-Dist: llama-cpp-python>=0.2.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Dynamic: license-file

# compressGPT

**compressGPT** is a flexible, modular training pipeline designed to bridge the gap between large foundation models and efficient edge-ready deployment.

It orchestrates the full lifecycle of Large Language Model (LLM) optimization — from supervised fine-tuning, through post-quantization recovery, to production-ready artifact generation — with a single, composable API.

Unlike rigid training scripts, compressGPT allows developers to define **custom compression workflows** by composing high-level stages such as `ft`, `compress_4bit`, and `deploy`. Whether you need a high-accuracy FP16 model for server inference or a highly compressed GGUF model for CPU-only deployment, compressGPT automates tokenization, adapter training, memory-efficient evaluation, and artifact generation to deliver the **smallest runnable model that preserves task-level accuracy**.

---

## 🚀 Quick Start

Below is a complete example that transforms a CSV dataset into a compressed, deployment-ready 4-bit Llama-3 model.

```python
from compressgpt import (
    CompressTrainer,
    DatasetBuilder,
    TrainingConfig,
    DeploymentConfig,
)

prompt_template = (
    'Classify this notification as "Important" or "Ignore".\n'
    'Important: Security alerts, direct messages, payment confirmations.\n'
    'Ignore: Marketing promos, news digests, social media likes.\n\n'
    'Notification: {text}\n'
    'Answer:'
)

MODEL_ID = "meta-llama/Llama-3.2-1B"

# Build dataset
builder = DatasetBuilder(
    data_path="notifications.csv",
    model_id=MODEL_ID,
    prompt_template=prompt_template,
    input_column_map={"text": "message_body"},
    label_column="label",
).build()

# Run compression pipeline
trainer = CompressTrainer(
    model_id=MODEL_ID,
    dataset_builder=builder,
    stages=["ft", "compress_4bit", "deploy"],
    training_config=TrainingConfig(
        num_train_epochs=1,
        eval_strategy="epoch",
        save_strategy="epoch",
    ),
    deployment_config=DeploymentConfig(
        save_merged_fp16=True,     # Canonical dense model
        save_quantized_4bit=True,  # BitsAndBytes 4-bit
        save_gguf_q4_0=True,       # GGUF for llama.cpp
    ),
)

results = trainer.run()

print("Training complete!")
print(results)
```

## 📦 Deployment & Artifacts

### Deployment Methods
The final stage of the pipeline, **`deploy`**, automatically converts your optimized model into rigorous production formats. Controlled by `DeploymentConfig`, it supports:

*   **GGUF (`save_gguf_q4_0`, etc.)**: The gold standard for **CPU inference**. These files can be loaded directly into [llama.cpp](https://github.com/ggerganov/llama.cpp) or [Ollama](https://ollama.com).
*   **Quantized 4-bit (`save_quantized_4bit`)**: Pre-shrunk BitsAndBytes models. Ideal for low-VRAM **GPU inference** using Python/Transformers.
*   **Merged FP16 (`save_merged_fp16`)**: The canonical high-precision model. Use this for **vLLM / TGI servers** or further research.

### Saving Models & Trade-offs
A unique feature of compressGPT is that **every stage saves its own model and metrics**. This allows you to deploy different versions of the *same model* to different devices based on their constraints.

**1. Default Outputs (`runs/default/`)**
Every stage you run automatically saves its result:
*   `ft_adapter/`: High-accuracy LoRA adapter (best for Cloud/GPU).
*   `compress_4bit_merged/`: Quantized & recovered model (best for accuracy/size balance).
*   `metrics.json`: Compare `ft` vs `compress_4bit` accuracy to make data-driven deployment decisions.

**2. Deploy Outputs (`runs/default/deploy/`)**
Production-ready artifacts are generated here **only if enabled** in `DeploymentConfig`:

```text
runs/default/deploy/
├── merged_fp16/        # Universal format (vLLM, TGI)
├── quantized_4bit/     # Python-native compressed (Transformers)
└── gguf/
    ├── model-f16.gguf  # High precision GGUF
    └── model-q4_0.gguf # Optimized Edge/CPU GGUF
```

---

## ⚠️ Current Support
Currently, compressGPT is optimized for **Classification Tasks** (e.g., Sentiment, Intent Detection, Spam Filtering). Support for Generation tasks (RAG, Chat) is coming soon.
