Metadata-Version: 2.4
Name: quarterbit
Version: 1.0.0
Summary: AXIOM - High-performance quantized optimizer with 73% memory savings
Home-page: https://quarterbit.dev
Author: Kyle Clouthier
Author-email: Kyle Clouthier <info@quarterbit.dev>
License: Proprietary - Free tier available, commercial use requires license
Project-URL: Homepage, https://quarterbit.dev
Project-URL: Repository, https://github.com/DigitalMax321/quarterbit
Project-URL: Documentation, https://quarterbit.dev/docs
Keywords: pytorch,optimizer,axiom,training,gpu,cuda,adam,memory-efficient
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.20
Requires-Dist: requests>=2.20
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: build; extra == "dev"
Requires-Dist: twine; extra == "dev"
Requires-Dist: cython; extra == "dev"
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# QuarterBit AXIOM

**High-Performance Quantized Optimizer**

73% memory savings. Drift-free precision. Beats AdamW convergence.

## The Problem

Training large AI models requires massive GPU memory for optimizer states. AdamW stores 8 bytes per parameter just for momentum and variance — that's 8GB for a 1B parameter model.

Additionally, long training runs suffer from **floating-point drift** — tiny gradient updates accumulate rounding errors over millions of steps, causing:
- Stalled convergence in late training
- Numerical instability
- Suboptimal final models

## The Solution

QuarterBit **AXIOM** solves both problems:

| Metric | AdamW | AXIOM | Improvement |
|--------|-------|-------|-------------|
| Memory per param | 8.0 bytes | 2.14 bytes | **73% savings** |
| Precision drift | Accumulates errors | **Drift-free** | Eliminated |
| Convergence | Baseline | **3-4% better** | Faster training |

## Installation

```bash
pip install quarterbit
```

**Supported GPUs:**
- NVIDIA T4, V100, A100, L4, L40
- NVIDIA RTX 30 series (3060-3090)
- NVIDIA RTX 40 series (4060-4090)
- NVIDIA H100, H200

## Quick Start

```python
from quarterbit import Axiom

# Drop-in replacement for AdamW
optimizer = Axiom(
    model.parameters(),
    lr=5e-4,
    weight_decay=0.1,
    total_steps=10000
)

# Train as usual
for batch in dataloader:
    loss = model(batch)
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()
```

## Why AXIOM?

### 1. 73% Memory Savings
Train larger models on the same GPU. AXIOM uses FP4 quantization with per-parameter Log8 variance to reduce optimizer state from 8 bytes to 2.14 bytes per parameter.

### 2. Drift-Free Training
Error-Free Transformations eliminate floating-point accumulation errors. Your model trains with perfect numerical stability from step 1 to step 1,000,000+.

### 3. Better Convergence
Two-Step Nesterov momentum, temporal coherence, and soft cautious masking deliver 3-4% better validation loss than AdamW.

### 4. Drop-In Replacement
Same API as PyTorch optimizers. Change one line of code.

## Benchmarks

Validated on GPT-2 Small (124M parameters), WikiText-2, 500 steps:

| Optimizer | Val Loss | Memory/Param | Savings |
|-----------|----------|--------------|---------|
| AdamW | 4.89 | 8.0 B | baseline |
| **AXIOM** | **4.72** | **2.14 B** | **73%** |

AXIOM consistently beats AdamW by 3-4% on validation loss across architectures.

See our [benchmark results](benchmark/) for full results including multi-model comparisons.

## Requirements

- Python 3.8+
- PyTorch 1.8+
- NVIDIA GPU with CUDA support
- Linux or Windows

## Pricing

| Tier | Price | Use Case |
|------|-------|----------|
| **Free** | $0 | Personal, research, evaluation |
| **Pro** | $299/mo | Commercial use, up to 10 GPUs |
| **Team** | $2,499/mo | Up to 100 GPUs, priority support |
| **Enterprise** | Custom | Unlimited, custom SLA |

See [quarterbit.dev/pricing](https://quarterbit.dev/pricing) for details.

## License

Proprietary - see [LICENSE](LICENSE) for details. Free tier available.

## Links

- Website: [quarterbit.dev](https://quarterbit.dev)
- Documentation: [quarterbit.dev/docs](https://quarterbit.dev/docs)
- Email: info@quarterbit.dev

---

Copyright (c) 2026 Clouthier Simulation Labs. All rights reserved.
