Metadata-Version: 2.4
Name: pruneforge
Version: 0.1.0
Summary: A Hierarchical Pruning Architecture for On-Device Language Intelligence
Author: astromini
Project-URL: Homepage, https://github.com/astromini/pruneforge
Project-URL: Repository, https://github.com/astromini/pruneforge
Project-URL: Issues, https://github.com/astromini/pruneforge/issues
Keywords: nlp,language-model,pruning,on-device,inference,hierarchical,multilingual
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: torch>=2.0.0
Requires-Dist: numpy>=1.24.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Provides-Extra: hnsw
Requires-Dist: faiss-cpu>=1.7.4; extra == "hnsw"

# PruneForge

**A Hierarchical Pruning Architecture for On-Device Language Intelligence**

PruneForge is a research prototype (v0.1, 2026) that replaces the standard
transformer decode-from-scratch approach with a retrieve → prune → assemble
pipeline, designed to run efficiently on CPU-class hardware.

---

## Architecture

```
Query
  │
  ▼
Knowledge Pool          ← compressed semantic store (contrastive pre-training)
  │  TopK retrieval
  ▼
Stage 1: Morphological Decomposition   ← BiLSTM, character-level, language-agnostic
  │
Stage 2: Topical Relevance Filter      ← bilinear scorer, prunes ~70% of candidates
  │
Stage 3: Composability Assessment      ← pairwise MLP, O(k²)
  │
Stage 4: Coherence Validation          ← small Transformer encoder
  │
Stage 5: Handoff Formatting            ← sort by relevance, no parameters
  │
  ▼
Composition Motor                      ← Transformer decoder + self-correction loop
  │
  ▼
Output text
```

**Symbolic Content Flag (SCF):** when the candidate set is predominantly
symbolic (code, math, formulas), stages 2–4 switch to operator-aware modes
rather than semantic modes.

---

## Installation

```bash
pip install pruneforge
```

With HNSW retrieval support (optional, requires `faiss`):

```bash
pip install "pruneforge[hnsw]"
```

From source (editable):

```bash
git clone https://github.com/astromini/pruneforge
cd pruneforge
pip install -e .
```

---

## Quick Start

### Inference

```python
from pruneforge import PruneForgeModel

model = PruneForgeModel.from_pretrained("output/")
response = model.generate("Yarın hava nasıl olacak?")
print(response)
```

### Training

```bash
# Plain-text corpus only (self-supervised)
python train.py --data corpus.txt --lang tr --output output/

# With morphological annotations (Stage 1 supervised)
python train.py --data corpus.txt --morph morph.tsv --lang tr --output output/

# GPU
python train.py --data corpus.txt --lang multilingual --device cuda --output output/

# Quick smoke test (small epochs)
python train.py --data corpus.txt --pool_epochs 1 --stage_epochs 1 --motor_epochs 1
```

Training runs 7 sequential phases automatically:

| Phase | Component | Notes |
|---|---|---|
| 1 | Knowledge Pool | Contrastive pre-training (NT-Xent) |
| 2 | Stage 1 | Morphological decomposition (supervised or self-supervised) |
| 3 | Stage 2 | Topical relevance filter |
| 4 | Stage 3 | Composability assessment |
| 5 | Stage 4 | Coherence validation |
| 6 | Motor | Supervised CE + REINFORCE |
| 7 | Joint | End-to-end fine-tuning |

---

## Morphological Corpus Format

For Stage 1 supervised training (`--morph`), provide a TSV file:

```
yemekler	yemek	ler
koşuyorum	koş	uyor,um
running	run	ing
```

Columns: `token \t root \t affix_1,affix_2,...`

If no morphological corpus is available, omit `--morph` — Stage 1 falls back
to self-supervised character-level reconstruction.

---

## Configuration

All hyperparameters live in `pruneforge/config.py`.
Pass a `PruneForgeConfig` to any component:

```python
from pruneforge.config import PruneForgeConfig

cfg = PruneForgeConfig()
cfg.knowledge_pool.num_concepts = 500_000   # reduce for low-RAM devices
cfg.knowledge_pool.embedding_dim = 128
cfg.motor.num_layers = 4
```

---

## Package Structure

```
pruneforge/
├── pruneforge/
│   ├── __init__.py
│   ├── config.py               ← all hyperparameters
│   ├── inference.py            ← PruneForgeModel (end-to-end)
│   ├── knowledge_pool/
│   │   ├── __init__.py
│   │   └── pool.py             ← KnowledgePool + QueryEncoder
│   ├── pipeline/
│   │   ├── __init__.py
│   │   ├── pipeline.py         ← HierarchicalPruningPipeline
│   │   ├── stage1_morphology.py
│   │   ├── stage2_relevance.py
│   │   ├── stage3_composability.py
│   │   ├── stage4_coherence.py
│   │   └── stage5_handoff.py
│   ├── motor/
│   │   ├── __init__.py
│   │   └── motor.py            ← CompositionMotor
│   └── utils/
│       ├── __init__.py
│       └── data.py             ← tokenizer + datasets
├── train.py                    ← single-command training script
├── tests/
│   └── test_e2e.py
├── pyproject.toml
├── setup.py
└── requirements.txt
```

---

## Running Tests

```bash
pip install -e .
python tests/test_e2e.py
```

Expected output:
```
PruneForge End-to-End Smoke Test
========================================
[ 5/5 ] Tokenizer ... OK
[ 1/5 ] Knowledge Pool ... OK
[ 2/5 ] Full Pipeline (Stages 1-5) ... OK  (10→N→N→N tokens)
[ 3/5 ] Composition Motor ... OK  (iterations used: [1, 1])
[ 4/5 ] Training losses ... OK  (CE=...  RL=...)

✓ All tests passed.
```

---

## Requirements

- Python ≥ 3.9
- PyTorch ≥ 2.0.0
- NumPy ≥ 1.24.0
- (optional) faiss-cpu ≥ 1.7.4 for HNSW retrieval

---

## License

Apache 2.0 — see [LICENSE](LICENSE).

---

## Citation

```bibtex
@software{pruneforge2026,
  author  = {astromini},
  title   = {PruneForge: A Hierarchical Pruning Architecture for On-Device Language Intelligence},
  year    = {2026},
  version = {0.1.0},
  url     = {https://github.com/astromini/pruneforge},
}
```
