Metadata-Version: 2.4
Name: transformer-cloner
Version: 0.1.0
Summary: Clone and prune transformer models with new tokenizers
Project-URL: Homepage, https://github.com/AliTava662/transformer-cloner
Project-URL: Documentation, https://github.com/AliTava662/transformer-cloner#readme
Project-URL: Repository, https://github.com/AliTavakolian/transformer-cloner
Project-URL: Issues, https://github.com/AliTavakolian/transformer-cloner/issues
Author-email: Ali Bayram <alibayram@example.com>
License-Expression: MIT
License-File: LICENSE
Keywords: deep-learning,huggingface,llm,model-cloning,model-pruning,tokenizer,transformers
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Requires-Dist: torch>=2.0.0
Requires-Dist: transformers>=4.40.0
Provides-Extra: dev
Requires-Dist: build>=1.0.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: twine>=4.0.0; extra == 'dev'
Description-Content-Type: text/markdown

# Transformer Cloner

Clone and prune transformer models with new tokenizers. Create smaller, more efficient models by mapping vocabularies, reducing dimensions, and pruning layers.

## Features

- 🔄 **Vocabulary Mapping**: Map tokens from a new tokenizer to original model embeddings
- 📉 **Model Pruning**: Reduce hidden size, layers, attention heads, and more
- 🎯 **Multiple Strategies**: Choose from mean, sum, first, last, weighted, max, min for embedding combination
- ✅ **Validation**: Automatic config validation to prevent incompatible architectures
- 🚀 **Fast**: Batch processing for efficient token ID mapping

## Installation

```bash
pip install transformer-cloner
```

## Quick Start

### Clone with New Tokenizer

```python
from transformer_cloner import TransformerCloner, EmbeddingStrategy

cloner = TransformerCloner(
    org_model_id="google/gemma-3-270m-it",
    target_tokenizer_id="your-username/custom-tokenizer",
)

# Clone with mean embedding strategy
model = cloner.clone(strategy=EmbeddingStrategy.MEAN)
model.save_pretrained("cloned-model")
```

### Prune Model Architecture

```python
from transformer_cloner import TransformerCloner, PruningConfig, EmbeddingStrategy

cloner = TransformerCloner(
    org_model_id="google/gemma-3-270m-it",
    target_tokenizer_id="your-username/custom-tokenizer",
)

# Create a smaller model
pruning_config = PruningConfig(
    hidden_size=320,           # Reduce embedding dimension
    num_hidden_layers=9,       # Fewer layers
    intermediate_size=1024,    # Smaller FFN
    num_attention_heads=2,     # Fewer attention heads
)

model = cloner.clone_pruned(
    pruning_config=pruning_config,
    strategy=EmbeddingStrategy.MEAN,
)
model.save_pretrained("pruned-model")
```

### Vocabulary Pruning (Direct 1:1 Mapping)

```python
from transformer_cloner import TransformerCloner

cloner = TransformerCloner(
    org_model_id="google/gemma-3-270m-it",
    target_tokenizer_id="google/gemma-3-270m-it",  # Same tokenizer
)

# Keep only first 16k tokens
model, tokenizer = cloner.clone_with_vocab_pruning(vocab_size=16000)

model.save_pretrained("vocab-pruned-model")
tokenizer.save_pretrained("vocab-pruned-model")
```

## Embedding Strategies

When a target token maps to multiple source tokens, choose how to combine them:

| Strategy   | Description                                    |
| ---------- | ---------------------------------------------- |
| `MEAN`     | Average of all source embeddings (default)     |
| `SUM`      | Sum of all source embeddings                   |
| `FIRST`    | Use only the first token's embedding           |
| `LAST`     | Use only the last token's embedding            |
| `WEIGHTED` | Weighted average (more weight to first tokens) |
| `MAX`      | Element-wise maximum                           |
| `MIN`      | Element-wise minimum                           |

## Pruning Options

| Parameter             | Description                  |
| --------------------- | ---------------------------- |
| `hidden_size`         | Embedding dimension          |
| `num_hidden_layers`   | Number of transformer layers |
| `intermediate_size`   | FFN intermediate dimension   |
| `num_attention_heads` | Number of attention heads    |
| `num_key_value_heads` | Number of KV heads (for GQA) |
| `head_dim`            | Dimension per attention head |

## License

MIT License - see [LICENSE](LICENSE) for details.
