Metadata-Version: 2.1
Name: kolmg-lora
Version: 1.0.0
Summary: KOLMG-LoRA: Kolmogorov-Arnold Low-Rank Adaptation — a non-linear LoRA variant for kolmgformers and any PyTorch model
Author: KOLMGformers Contributors
License: Apache-2.0
Project-URL: Homepage, https://github.com/kolmgformers/kolmg-lora
Project-URL: Repository, https://github.com/kolmgformers/kolmg-lora
Keywords: deep-learning,lora,peft,fine-tuning,kolmogorov-arnold,kan,pytorch,nlp,kolmgformers,low-rank-adaptation
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: torch >=2.0
Provides-Extra: all
Requires-Dist: kolmgformers >=0.0.3 ; extra == 'all'
Requires-Dist: safetensors >=0.4 ; extra == 'all'
Provides-Extra: kolmgformers
Requires-Dist: kolmgformers >=0.0.3 ; extra == 'kolmgformers'
Provides-Extra: safetensors
Requires-Dist: safetensors >=0.4 ; extra == 'safetensors'

# kolmg-lora

**KOLMG-LoRA**: Kolmogorov-Arnold Low-Rank Adaptation — a fundamentally new LoRA variant that replaces the standard linear bottleneck (B×A) with a two-layer PL-KAN (Piecewise-Linear Kolmogorov-Arnold Network), giving **non-linear adaptation capacity** at a parameter cost similar to standard LoRA.

Designed as the native fine-tuning method for the [kolmgformers](https://github.com/kolmgformers/kolmgformers) package, but works with **any PyTorch `nn.Module`** including HuggingFace transformers.

---

## Install

```bash
pip install kolmg-lora                       # PyTorch only
pip install kolmg-lora[kolmgformers]         # + kolmgformers integration
pip install kolmg-lora[safetensors]          # + fast weight I/O
pip install kolmg-lora[all]                  # everything
```

---

## What makes KOLMG-LoRA different?

| Variant | Adapter path | Non-linear | Scaling |
|---|---|---|---|
| LoRA | B×A (linear) | No | α/r |
| rsLoRA | B×A (linear) | No | α/√r |
| LoRA+ | B×A (split LR) | No | α/r |
| DoRA | B×A + magnitude | No | α/r |
| QLoRA | B×A on 4-bit base | No | α/r |
| **KOLMG-LoRA** | **KAN bottleneck** | **Yes ✓** | α/r (or α/√r) |
| **KOLMG-DoRA** | **KAN + magnitude** | **Yes ✓** | α/r (or α/√r) |

Standard LoRA learns: `ΔW·x = B × A × x` — always linear.

KOLMG-LoRA learns: `ΔW·x = φ_out(φ_in(x))` where each φ is a mini KAN:

```
φ(x) = SiLU(x)·W_base  +  Σ_k  c_k · B_k(x)
```

`B_k` are B-spline basis functions on a uniform grid — each rank dimension gets its own learned activation shape, something no linear bottleneck can express at any rank.

---

## Quick start

```python
from kolmg_lora import KOLMGLoRAConfig, add_kolmg_lora, merge_lora, save_lora, load_lora

# 1. Configure
cfg = KOLMGLoRAConfig(
    rank       = 16,
    alpha      = 32.0,
    grid_size  = 4,      # KAN expressiveness knob (3–8 recommended)
    dropout    = 0.05,
)

# 2. Apply to any model
model = add_kolmg_lora(model, cfg)
# [kolmg-lora KOLMG-LoRA] 4 layers wrapped | rank=16 | grid=4 | order=1 | trainable: ...

# 3. Train normally — only KAN parameters update
optimizer = torch.optim.AdamW(
    filter(lambda p: p.requires_grad, model.parameters()), lr=1e-4
)

# 4. Save adapter (~few MB)
save_lora(model, "./my_adapter")

# 5. Merge for deployment (zero inference overhead)
model = merge_lora(model)
```

### With kolmgformers

```python
from kolmgformers import KOLMOGformerForCausalLM, KOLMOGformerConfig
from kolmg_lora import KOLMGLoRAConfig, add_kolmg_lora

model = KOLMOGformerForCausalLM(KOLMOGformerConfig(
    vocab_size=32000, hidden_size=512, num_channels=8, num_layers=6
))

cfg   = KOLMGLoRAConfig(rank=16, alpha=32.0, train_ffn=True)
model = add_kolmg_lora(model, cfg)
```

### With HuggingFace transformers

```python
from transformers import AutoModelForCausalLM
from kolmg_lora import add_kolmg_lora

model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
model = add_kolmg_lora(model, rank=16, alpha=32.0)
```

---

## Combining with other techniques

All combinations stack cleanly:

```python
# KOLMG-LoRA + rsLoRA (rank-stabilised scaling α/√r)
cfg = KOLMGLoRAConfig(rank=16, rs_lora=True)

# KOLMG-LoRA + DoRA (magnitude decomposition)
cfg = KOLMGLoRAConfig(rank=16, use_dora=True)

# KOLMG-LoRA + LoRA+ (higher LR for φ_out)
cfg = KOLMGLoRAConfig(rank=16, lora_plus_ratio=16.0)

# LoRA+ with per-layer param groups
groups = []
for m in model.modules():
    if isinstance(m, KOLMGLoRALinear):
        groups += m.get_lora_plus_param_groups(base_lr=1e-4)
optimizer = torch.optim.AdamW(groups)
```

---

## Config reference

```python
KOLMGLoRAConfig(
    rank             = 16,       # KAN bottleneck width
    alpha            = 32.0,     # scaling = alpha / rank  (or / √rank with rs_lora)
    dropout          = 0.05,     # dropout on adapter input
    target_modules   = None,     # None → ["q_proj","k_proj","v_proj","out"]
    train_ffn        = False,    # also wrap gate/up/down FFN projections
    rs_lora          = False,    # rank-stabilised scaling
    use_dora         = False,    # DoRA magnitude decomposition
    lora_plus_ratio  = 1.0,      # LoRA+ LR multiplier for φ_out
    grid_size        = 4,        # KAN grid intervals (3=fast, 8=expressive)
    spline_order     = 1,        # 1=piecewise-linear (fast), 3=cubic (smooth)
    grid_range       = (-1., 1.),# KAN input domain
    kan_scale_noise  = 0.1,      # spline weight init noise
    kan_scale_base   = 1.0,      # SiLU base path init scale
)
```

---

## License

Apache-2.0
