Metadata-Version: 2.4
Name: trustformers
Version: 0.1.1
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Rust
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Operating System :: OS Independent
Requires-Dist: numpy>=1.20.0
Requires-Dist: trustformers[dev,jupyter,torch,transformers] ; extra == 'all'
Requires-Dist: pytest>=7.0.0 ; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0 ; extra == 'dev'
Requires-Dist: mypy>=1.0.0 ; extra == 'dev'
Requires-Dist: black>=23.0.0 ; extra == 'dev'
Requires-Dist: ruff>=0.1.0 ; extra == 'dev'
Requires-Dist: jupyter>=1.0.0 ; extra == 'jupyter'
Requires-Dist: matplotlib>=3.5.0 ; extra == 'jupyter'
Requires-Dist: torch>=2.0.0 ; extra == 'torch'
Requires-Dist: transformers>=4.30.0 ; extra == 'transformers'
Requires-Dist: tokenizers>=0.13.0 ; extra == 'transformers'
Provides-Extra: all
Provides-Extra: dev
Provides-Extra: jupyter
Provides-Extra: torch
Provides-Extra: transformers
License-File: LICENSE
Summary: Python bindings for TrustformeRS - High-performance Rust Transformers library
Keywords: transformers,nlp,machine-learning,deep-learning,bert,gpt,llama,rust,pyo3
Home-Page: https://github.com/cool-japan/trustformers
Author-email: "COOLJAPAN OU (Team KitaSan)" <contact@cooljapan.tech>
Maintainer-email: "COOLJAPAN OU (Team KitaSan)" <contact@cooljapan.tech>
License: Apache-2.0
Requires-Python: >=3.9
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Documentation, https://github.com/cool-japan/trustformers/blob/main/README.md
Project-URL: Homepage, https://github.com/cool-japan/trustformers
Project-URL: Issues, https://github.com/cool-japan/trustformers/issues
Project-URL: Repository, https://github.com/cool-japan/trustformers

# TrustformeRS Python

High-performance transformer library for Python, written in Rust. Drop-in replacement for Hugging Face Transformers with significant performance improvements.

## Features

- 🚀 **10-100x faster** than pure Python implementations
- 🔄 **Drop-in replacement** for Hugging Face Transformers
- 🦀 **Written in Rust** for memory safety and performance
- 🔧 **Zero-copy tensor operations** with NumPy
- 🤝 **PyTorch interoperability** (optional)
- 📦 **No external dependencies** for core functionality

## Installation

```bash
pip install trustformers
```

### From source

```bash
# Install maturin (build tool for Rust Python extensions)
pip install maturin

# Clone the repository
git clone https://github.com/cool-japan/trustformers
cd trustformers/trustformers-py

# Build and install
maturin develop --release
```

## Quick Start

### Basic Usage

```python
from trustformers import AutoModel, AutoTokenizer, pipeline

# Load model and tokenizer
model = AutoModel.from_pretrained("bert-base-uncased")
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

# Create a pipeline
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)

# Run inference
results = classifier("This is a great library!")
print(results)
```

### Direct Model Usage

```python
import numpy as np
from trustformers import BertModel, Tensor

# Create model
model = BertModel.from_pretrained("bert-base-uncased")

# Create input tensors
input_ids = Tensor(np.array([[101, 2023, 2003, 1037, 2742, 102]]))
attention_mask = Tensor(np.ones((1, 6)))

# Forward pass
outputs = model(input_ids, attention_mask)
print(outputs["last_hidden_state"].shape)
```

### NumPy Integration

```python
import numpy as np
from trustformers import Tensor

# Create tensor from NumPy array
np_array = np.random.randn(2, 3, 4).astype(np.float32)
tensor = Tensor(np_array)

# Convert back to NumPy
np_array_back = tensor.numpy()

# Tensor operations
result = tensor.matmul(tensor.transpose())
```

### PyTorch Interoperability

```python
import torch
from trustformers import Tensor

# Convert from PyTorch
torch_tensor = torch.randn(2, 3, 4)
trust_tensor = Tensor.from_torch(torch_tensor)

# Convert to PyTorch
torch_tensor_back = trust_tensor.to_torch()
```

## Supported Models

- **BERT** and variants (RoBERTa, ALBERT, DistilBERT, ELECTRA, DeBERTa)
- **GPT-2** and variants (GPT-Neo, GPT-J)
- **T5** (encoder-decoder)
- **LLaMA** and **Mistral**
- **Vision Transformer (ViT)**
- **CLIP** (multimodal)

## API Compatibility

TrustformeRS provides a compatible API with Hugging Face Transformers:

```python
# Hugging Face Transformers
from transformers import AutoModel, AutoTokenizer

# TrustformeRS (drop-in replacement)
from trustformers import AutoModel, AutoTokenizer
```

Most code written for Hugging Face Transformers will work with minimal changes.

## Performance

Benchmarks on common tasks:

| Task | Model | HF Transformers | TrustformeRS | Speedup |
|------|-------|-----------------|--------------|---------|
| Text Classification | BERT-base | 52 ms | 3.2 ms | 16.3x |
| Text Generation | GPT-2 | 124 ms | 8.7 ms | 14.3x |
| Question Answering | BERT-large | 89 ms | 5.4 ms | 16.5x |

*Benchmarks run on Apple M1 Pro, batch size 1, sequence length 512*

## Advanced Features

### Custom Models

```python
from trustformers import PreTrainedModel, Tensor
import numpy as np

class CustomModel(PreTrainedModel):
    def __init__(self, config):
        super().__init__(config)
        # Define your model architecture
    
    def forward(self, input_ids, attention_mask=None):
        # Implement forward pass
        pass
```

### Training (Coming Soon)

```python
from trustformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=16,
    learning_rate=5e-5,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
)

trainer.train()
```

## Development

### Building from source

```bash
# Install development dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Format code
black .
isort .

# Lint
ruff check .
```

### Architecture

The library is organized into several components:

- `tensor.rs` - Tensor operations and NumPy integration
- `models.rs` - Model implementations (BERT, GPT-2, etc.)
- `tokenizers.rs` - Tokenizer implementations
- `pipelines.rs` - High-level pipeline API
- `auto.rs` - Auto classes for model/tokenizer loading
- `training.rs` - Training utilities (WIP)

## License

Apache License 2.0

## Contributing

Contributions are welcome! Please read our [Contributing Guide](../CONTRIBUTING.md) for details.

## Citation

If you use TrustformeRS in your research, please cite:

```bibtex
@software{trustformers2024,
  title = {TrustformeRS: High-Performance Transformers in Rust},
  year = {2024},
  url = {https://github.com/cool-japan/trustformers}
}
```
