Metadata-Version: 2.4
Name: gekolib
Version: 0.1.0
Summary: Gradient-Efficient Knowledge Optimization - Smart training for any LLM
Home-page: https://github.com/yourusername/geko
Author: Syed Abdur Rehman
Author-email: your.email@example.com
Keywords: machine learning,deep learning,nlp,llm,training,optimization,curriculum learning,sample selection
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=1.9.0
Requires-Dist: tqdm>=4.60.0
Provides-Extra: dev
Requires-Dist: pytest>=6.0; extra == "dev"
Requires-Dist: black>=21.0; extra == "dev"
Requires-Dist: isort>=5.0; extra == "dev"
Provides-Extra: transformers
Requires-Dist: transformers>=4.20.0; extra == "transformers"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# GEKO: Gradient-Efficient Knowledge Optimization

<p align="center">
  <img src="https://em-content.zobj.net/source/apple/391/lizard_1f98e.png" width="150" alt="GEKO">
</p>

<p align="center">
  <img src="https://img.shields.io/badge/python-3.8+-blue.svg" alt="Python">
  <img src="https://img.shields.io/badge/pytorch-1.9+-red.svg" alt="PyTorch">
  <img src="https://img.shields.io/badge/license-Apache%202.0-green.svg" alt="License">
</p>

<p align="center">
  <b>A plug-and-play training framework that makes LLM training more efficient.</b>
</p>

> *Like LoRA revolutionized fine-tuning, GEKO revolutionizes training.*

---

## Key Insight

<p align="center">
  <img src="assets/geko_insight.png" alt="GEKO Insight" width="700">
</p>

Traditional training treats all samples equally:

$$\mathcal{L}_{standard} = \frac{1}{N} \sum_{i=1}^{N} \ell(x_i, y_i)$$

**GEKO** weights samples by their learning value:

$$\mathcal{L}_{GEKO} = \frac{1}{N} \sum_{i=1}^{N} w_i \cdot \ell(x_i, y_i) \quad \text{where} \quad w_i = f(bucket_i)$$

---

## Installation

```bash
pip install geko
```

---

## Quick Start

```python
from geko import GEKOTrainer, GEKOConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

trainer = GEKOTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=your_dataset,
)

trainer.train()
print(trainer.get_efficiency_report())
```

---

## The GEKO Algorithm

### Sample Partitioning

<p align="center">
  <img src="assets/bucket_classification.png" alt="Bucket Classification" width="800">
</p>

```mermaid
flowchart TD
    A[Sample] --> B{Correct?}
    B -->|Yes| C{Confident & High Quality?}
    B -->|No| D{High Confidence?}
    C -->|Yes| E[🔵 FREEZE<br/>w = 0<br/>Never train]
    C -->|No| F[🟢 LIGHT<br/>w = 0<br/>Low priority]
    D -->|Yes| G[🔴 HARD<br/>w = 3<br/>Highest priority]
    D -->|No| H[🟠 FOCUS<br/>w = 1<br/>Medium priority]

    style E fill:#3498db,color:#fff
    style F fill:#2ecc71,color:#fff
    style G fill:#e74c3c,color:#fff
    style H fill:#f39c12,color:#fff
```

### Bucket Definitions

| Bucket | Condition | Weight | Description |
|:------:|:----------|:------:|:------------|
| 🔵 **FREEZE** | $correct \land c > 0.85 \land q > 0.80$ | $w = 0$ | Mastered |
| 🟢 **LIGHT** | $correct \land (c \leq 0.85 \lor q \leq 0.80)$ | $w = 0$ | Uncertain |
| 🟠 **FOCUS** | $\neg correct \land c \leq 0.60$ | $w = 1$ | Wrong |
| 🔴 **HARD** | $\neg correct \land c > 0.60$ | $w = 3$ | **Confident-wrong** |

---

## Mountain Curriculum

<p align="center">
  <img src="assets/mountain_curriculum.png" alt="Mountain Curriculum" width="700">
</p>

```mermaid
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#3498db'}}}%%
xychart-beta
    title "Mountain Curriculum - Difficulty vs Progress"
    x-axis "Training Progress" [0, 0.15, 0.35, 0.65, 0.85, 1.0]
    y-axis "Difficulty" 0 --> 1
    line [0.2, 0.5, 1.0, 1.0, 0.5, 0.2]
```

### Five Phases

```mermaid
gantt
    title Mountain Curriculum Phases
    dateFormat X
    axisFormat %s

    section Difficulty
    WARMUP (Easy)       :a1, 0, 15
    ASCENT (Medium)     :a2, 15, 35
    PEAK (Hard)         :a3, 35, 65
    DESCENT (Medium)    :a4, 65, 85
    CONSOLIDATE (Easy)  :a5, 85, 100
```

| Phase | Progress | HARD | FOCUS | LIGHT | Strategy |
|:------|:--------:|:----:|:-----:|:-----:|:---------|
| WARMUP | 0-15% | 1 | 2 | 3 | Build foundation |
| ASCENT | 15-35% | 2 | 3 | 1 | Increase difficulty |
| PEAK | 35-65% | 5 | 2 | 0 | Maximum learning |
| DESCENT | 65-85% | 2 | 3 | 1 | Reduce difficulty |
| CONSOLIDATE | 85-100% | 1 | 2 | 3 | Reinforce |

---

## Q-Value Learning

<p align="center">
  <img src="assets/q_value_learning.png" alt="Q-Value Learning" width="600">
</p>

Each sample maintains a Q-value representing "learnability":

$$Q_{t+1}(s) = (1 - \alpha) \cdot Q_t(s) + \alpha \cdot \left(1 - \frac{\ell_t(s)}{\ell_{max}}\right)$$

```mermaid
graph LR
    A[Sample Loss ↓] --> B[Q-Value ↑]
    B --> C{Q > threshold?}
    C -->|Yes| D[Move to FREEZE]
    C -->|No| E[Stay trainable]

    style D fill:#3498db,color:#fff
    style E fill:#f39c12,color:#fff
```

---

## Efficiency Analysis

<p align="center">
  <img src="assets/efficiency_curve.png" alt="Efficiency Curve" width="600">
</p>

### Compute Savings Over Time

```mermaid
%%{init: {'theme': 'base'}}%%
pie showData
    title "Bucket Distribution (Epoch 10)"
    "FREEZE (Saved)" : 80
    "LIGHT" : 15
    "FOCUS" : 4
    "HARD" : 1
```

### Training Progression

| Epoch | FREEZE | LIGHT | FOCUS | HARD | **Compute Saved** |
|:-----:|:------:|:-----:|:-----:|:----:|:-----------------:|
| 1 | 0% | 20% | 60% | 20% | 0% |
| 2 | 15% | 25% | 45% | 15% | **15%** |
| 3 | 35% | 30% | 25% | 10% | **35%** |
| 5 | 55% | 25% | 15% | 5% | **55%** |
| 10 | 80% | 15% | 4% | 1% | **80%** |

---

## Architecture

<p align="center">
  <img src="assets/architecture.png" alt="GEKO Architecture" width="800">
</p>

```mermaid
flowchart TB
    subgraph Input
        A[Any LLM Model]
        B[Training Dataset]
    end

    subgraph GEKO["GEKO Framework"]
        C[GEKOTrainer]
        D[Sample Partitioner]
        E[Mountain Curriculum]
        F[Sample States]

        C --> D
        C --> E
        D --> F
        E --> F
    end

    subgraph Output
        G[Efficient Training]
        H[Compute Savings]
    end

    A --> C
    B --> C
    C --> G
    C --> H

    style GEKO fill:#f5f5f5,stroke:#333
```

---

## Theoretical Guarantees

### Convergence

Under standard assumptions, GEKO converges:

$$\sum_{t=1}^{\infty} w_t^{(s)} = \infty \quad \forall s \notin \text{FREEZE}$$

### Efficiency Bound

$$T_{GEKO} \leq T_{standard} \cdot (1 - \mathbb{E}[F])$$

Where $\mathbb{E}[F]$ = expected freeze fraction.

---

## Results

<p align="center">
  <img src="assets/results_comparison.png" alt="Results" width="600">
</p>

| Metric | Standard | GEKO | Improvement |
|:-------|:--------:|:----:|:-----------:|
| Training Time | 100% | 50-70% | **30-50% faster** |
| Compute Cost | 100% | 50-70% | **30-50% cheaper** |
| Final Loss | $\ell^*$ | $\leq \ell^*$ | Equal or better |

---

## Citation

```bibtex
@software{geko2026,
  author = {Syed Abdur Rehman},
  title = {GEKO: Gradient-Efficient Knowledge Optimization},
  year = {2026},
  url = {https://github.com/ra2157218-boop/GEKO}
}
```

---

## License

Apache 2.0

---

<p align="center">
  <b>GEKO</b> - Train smarter, not harder.
</p>
