Metadata-Version: 2.4
Name: morphoformer
Version: 3.0.0
Summary: MorphFormer: multilingual morphological reinflection via character-level Transformer
Author: F000NK, Voluntas Progressus
License-Expression: MIT
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.14
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.14
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=2.0.0
Requires-Dist: chartoken>=1.0.0
Requires-Dist: torchblocks>=1.0.0
Requires-Dist: sigmorphon>=1.0.0
Requires-Dist: trainkit>=1.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: ruff>=0.4.0; extra == "dev"
Dynamic: license-file

# MorphFormer v3

Character-level Transformer for multilingual morphological reinflection.

## Installation

```bash
pip install morphoformer
```

Requires Python >= 3.14 and PyTorch >= 2.0.

Dependencies (`chartoken`, `torchblocks`, `sigmorphon`, `trainkit`) are installed automatically.

## Quick Start

```bash
# Download data
morphoformer download --lang rus,deu,fra --merge

# Train
morphoformer train --preset medium --data "data/collections/*_train.tsv" --device cuda

# Infer
morphoformer infer --checkpoint checkpoints/morphformer_epoch50.pt --word "laufen" --morph "V;IND;PST;3;SG" --lang deu

# Interactive REPL
morphoformer serve --checkpoint checkpoints/morphformer_epoch50.pt
```

## Presets

| Preset | d_model | Encoder | Decoder | ~Params | VRAM   |
|--------|---------|---------|---------|---------|--------|
| small  | 384     | 4 layers| 3 layers| ~7M     | < 4 GB |
| medium | 512     | 8 layers| 6 layers| ~45M    | 4-8 GB |
| large  | 768     | 10 layers| 8 layers| ~120M  | >= 8 GB|

## CLI Commands

| Command       | Description                       |
|---------------|-----------------------------------|
| `train`       | Train model from TSV data         |
| `infer`       | Single-word inference             |
| `serve`       | Interactive REPL                  |
| `download`    | Download SigMorphon datasets      |
| `modules`     | List registered NN modules        |
| `init-config` | Generate TOML config template     |

## Data Format

TSV with columns: `lemma\tfeatures\tsurface_form\tlanguage`

```
laufen	V;IND;PST;3;SG	lief	deu
```

## Python API

```python
import torch
from chartoken import CharVocab, FeatureVocab
from morphoformer.model import MorphFormer
from morphoformer.inference import greedy_decode

checkpoint = torch.load("checkpoints/morphformer_epoch50.pt", map_location="cpu", weights_only=False)
char_vocab = CharVocab.from_dict(checkpoint["char_vocab"])
feature_vocab = FeatureVocab.from_dict(checkpoint["feature_vocab"])
lang_to_id = checkpoint["lang_to_id"]

# ... build model, load state_dict, call greedy_decode()
```

## Architecture

- Encoder-Decoder Transformer at character level
- Grouped Query Attention (GQA) with KV cache
- RoPE positional embeddings
- SwiGLU feed-forward networks
- Language-conditioned adapters
- Conformer-style local convolution in encoder
- Structured morphological feature encoding

## Supported Devices

| Device       | Flag             |
|--------------|------------------|
| Auto-detect  | `--device auto`  |
| NVIDIA GPU   | `--device cuda`  |
| AMD GPU      | `--device rocm`  |
| Intel Arc    | `--device xpu`   |
| Apple Silicon| `--device mps`   |
| CPU          | `--device cpu`   |

## License

See LICENSE file in the repository root.
