Metadata-Version: 2.4
Name: plimai
Version: 0.1.9
Summary: Vision LLMs with LoRA fine-tuning.
Project-URL: Documentation, https://github.com/plimai/plim/tree/main/docs
Project-URL: Download, https://pypi.org/project/plimai/#files
Project-URL: Homepage, https://github.com/plimai/plim
Project-URL: Release Notes, https://github.com/plimai/plim/releases
Project-URL: Source, https://github.com/plimai/plim
Project-URL: Tracker, https://github.com/plimai/plim/issues
Project-URL: Funding, https://github.com/sponsors/plimai
Author-email: Plim <priteshraj41@gmail.com>
License-Expression: MIT
License-File: LICENSE
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.8
Requires-Dist: toml
Requires-Dist: torch
Requires-Dist: torchvision
Description-Content-Type: text/markdown

# plimai: Vision LLMs (Large Language Models for Vision) with Efficient LoRA Fine-Tuning

<hr/>
<p align="center">
  <picture>
    <source media="(prefers-color-scheme: dark)" srcset="static/plimai-use-dark.png">
    <img src="https://raw.githubusercontent.com/plimai/plim/main/static/plimai-white.png" alt="Plimai Logo" width="full"/>
  </picture>
</p>

<!-- Badges -->
<p align="center">
  <a href="https://pypi.org/project/plimai/"><img src="https://img.shields.io/pypi/v/plimai.svg" alt="PyPI version"></a>
  <a href="https://pepy.tech/project/plimai"><img src="https://pepy.tech/badge/plimai" alt="Downloads"></a>
  <a href="LICENSE"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License"></a>
  <a href="https://img.shields.io/badge/coverage-90%25-brightgreen" alt="Coverage"> <img src="https://img.shields.io/badge/coverage-90%25-brightgreen"/></a>
  <a href="https://img.shields.io/badge/python-3.8%2B-blue" alt="Python Version"> <img src="https://img.shields.io/badge/python-3.8%2B-blue"/></a>
  <a href="https://github.com/psf/black"><img src="https://img.shields.io/badge/code%20style-black-000000.svg" alt="Code style: black"></a>
</p>

<p align="center">
  <b>Modular Vision LLMs (Large Language Models for Vision) with Efficient LoRA Fine-Tuning</b><br/>
  <span style="font-size:1.1em"><i>Build, adapt, and fine-tune vision models with ease and efficiency.</i></span>
</p>
<hr/>

## 🚀 Quick Links
- [Documentation](docs/index.md)
- [Tutorials](docs/tutorials/index.md)
- [Changelog](CHANGELOG.md)
- [Contributing Guide](CONTRIBUTING.md)
- [Roadmap](ROADMAP.md)

---

## 📚 Table of Contents
- [Features](#-features)
- [Showcase](#-showcase)
- [Getting Started](#-getting-started)
- [Supported Python Versions](#-supported-python-versions)
- [Why Plimai?](#-why-plimai)
- [Architecture Overview](#-architecture-overview)
- [Core Modules](#-core-modules)
- [Performance & Efficiency](#-performance--efficiency)
- [Advanced Configuration](#-advanced-configuration)
- [Documentation & Resources](#-documentation--resources)
- [Testing & Quality](#-testing--quality)
- [Examples & Use Cases](#-examples--use-cases)
- [Extending the Framework](#-extending-the-framework)
- [Contributing](#-contributing)
- [FAQ](#-faq)
- [Citation](#-citation)
- [Acknowledgements](#-acknowledgements)
- [License](#-license)

---

## ✨ Features
- 🔧 **Plug-and-play LoRA adapters** for parameter-efficient fine-tuning
- 🏗️ **Modular Vision Transformer (ViT) backbone** with customizable components
- 🎯 **Unified model zoo** for open-source visual models
- ⚙️ **Easy configuration** and extensible codebase
- 🚀 **Production ready** with comprehensive testing and documentation
- 💾 **Memory efficient** training with gradient checkpointing support
- 📊 **Built-in metrics** and visualization tools
- 🧩 **Modular training loop** with LoRA support
- 🎯 **Unified CLI** for fine-tuning and evaluation
- 🔌 **Extensible callbacks** (early stopping, logging, etc.)
- 📦 **Checkpointing and resume**
- 🚀 **Mixed precision training**
- 🔧 **Easy dataset and model extension**
- ⚡ **Ready for distributed/multi-GPU training**

---

## 🚀 Showcase

**plimai** is a modular, research-friendly framework for building and fine-tuning Vision Large Language Models (LLMs) with efficient Low-Rank Adaptation (LoRA) support. Whether you're working on image classification, visual question answering, or custom vision tasks, plimai provides the tools you need for parameter-efficient model adaptation.

---

## 🏁 Getting Started

Here's a minimal example to get you up and running:

```bash
pip install plimai
```

```python
import torch
from plimai.models.vision_transformer import VisionTransformer
from plimai.utils.config import default_config

# Create model
x = torch.randn(2, 3, 224, 224)
model = VisionTransformer(
    img_size=default_config['img_size'],
    patch_size=default_config['patch_size'],
    in_chans=default_config['in_chans'],
    num_classes=default_config['num_classes'],
    embed_dim=default_config['embed_dim'],
    depth=default_config['depth'],
    num_heads=default_config['num_heads'],
    mlp_ratio=default_config['mlp_ratio'],
    lora_config=default_config['lora'],
)

# Forward pass
with torch.no_grad():
    out = model(x)
    print('Output shape:', out.shape)
```

For advanced usage, CLI details, and more, see the [Documentation](docs/index.md) and [src/plimai/cli/finetune.py](src/plimai/cli/finetune.py).

---

## 🐍 Supported Python Versions
- Python 3.8+

---

## 🧩 Why Plimai?

- **Parameter-efficient fine-tuning**: Plug-and-play LoRA adapters for fast, memory-efficient adaptation with minimal computational overhead
- **Modular ViT backbone**: Swap or extend components like patch embedding, attention, or MLP heads with ease
- **Unified model zoo**: Access and experiment with open-source visual models through a consistent interface
- **Research & production ready**: Clean, extensible codebase with comprehensive configuration options and robust utilities
- **Memory efficient**: Fine-tune large models on consumer hardware by updating only a small fraction of parameters

---

## 🏗️ Architecture Overview

plimai is built around a modular Vision Transformer (ViT) backbone, with LoRA adapters strategically injected into attention and MLP layers for efficient fine-tuning. This approach allows you to adapt large pre-trained models using only a fraction of the original parameters.

### Model Data Flow

```mermaid
---
config:
  layout: dagre
---
flowchart TD
 subgraph LoRA_Adapters["LoRA Adapters in Attention and MLP"]
        LA1(["LoRA Adapter 1"])
        LA2(["LoRA Adapter 2"])
        LA3(["LoRA Adapter N"])
  end
    A(["Input Image"]) --> B(["Patch Embedding"])
    B --> C(["CLS Token & Positional Encoding"])
    C --> D1(["Encoder Layer 1"])
    D1 --> D2(["Encoder Layer 2"])
    D2 --> D3(["Encoder Layer N"])
    D3 --> E(["LayerNorm"])
    E --> F(["MLP Head"])
    F --> G(["Output Class Logits"])
    LA1 -.-> D1
    LA2 -.-> D2
    LA3 -.-> D3
     LA1:::loraStyle
     LA2:::loraStyle
     LA3:::loraStyle
    classDef loraStyle fill:#e1f5fe,stroke:#0277bd,stroke-width:2px
```

### Architecture Components

**Legend:**
- **Solid arrows**: Main data flow through the Vision Transformer
- **Dashed arrows**: LoRA adapter injection points in encoder layers
- **Blue boxes**: LoRA adapters for parameter-efficient fine-tuning

**Data Flow Steps:**
1. **Input Image** (224×224×3): Raw image data ready for processing
2. **Patch Embedding**: Image split into 16×16 patches and projected to embedding dimension
3. **CLS Token & Positional Encoding**: Classification token prepended with learnable position embeddings
4. **Transformer Encoder Stack**: Multi-layer transformer with self-attention and MLP blocks
   - **LoRA Integration**: Low-rank adapters injected into attention and MLP layers
   - **Efficient Updates**: Only LoRA parameters updated during fine-tuning
5. **LayerNorm**: Final normalization of encoder outputs
6. **MLP Head**: Task-specific classification or regression head
7. **Output**: Final predictions (class probabilities, regression values, etc.)

---

## 🧩 Core Modules

| Module | Description | Key Features |
|--------|-------------|--------------|
| **PatchEmbedding** | Image-to-patch conversion and embedding | • Configurable patch sizes<br>• Learnable position embeddings<br>• Support for different input resolutions |
| **TransformerEncoder** | Multi-layer transformer backbone | • Self-attention mechanisms<br>• LoRA adapter integration<br>• Gradient checkpointing support |
| **LoRALinear** | Low-rank adaptation layers | • Configurable rank and scaling<br>• Memory-efficient updates<br>• Easy enable/disable functionality |
| **MLPHead** | Output projection layer | • Multi-class classification<br>• Regression support<br>• Dropout regularization |
| **Config System** | Centralized configuration management | • YAML/JSON config files<br>• Command-line overrides<br>• Validation and defaults |
| **Data Utils** | Preprocessing and augmentation | • Built-in transforms<br>• Custom dataset loaders<br>• Efficient data pipelines |

---

## 📊 Performance & Efficiency

### LoRA Benefits

| Metric | Full Fine-tuning | LoRA Fine-tuning | Improvement |
|--------|------------------|------------------|-------------|
| **Trainable Parameters** | 86M | 2.4M | **97% reduction** |
| **Memory Usage** | 12GB | 4GB | **67% reduction** |
| **Training Time** | 4 hours | 1.5 hours | **62% faster** |
| **Storage per Task** | 344MB | 9.6MB | **97% smaller** |

*Benchmarks on ViT-Base with CIFAR-100, RTX 3090*

### Supported Model Sizes

- **ViT-Tiny**: 5.7M parameters, perfect for experimentation
- **ViT-Small**: 22M parameters, good balance of performance and efficiency  
- **ViT-Base**: 86M parameters, strong performance across tasks
- **ViT-Large**: 307M parameters, state-of-the-art results

---

## 🔧 Advanced Configuration

### LoRA Configuration

```python
lora_config = {
    "rank": 16,                    # Low-rank dimension
    "alpha": 32,                   # Scaling factor
    "dropout": 0.1,                # Dropout rate
    "target_modules": [            # Modules to adapt
        "attention.qkv",
        "attention.proj", 
        "mlp.fc1",
        "mlp.fc2"
    ],
    "merge_weights": False         # Whether to merge during inference
}
```

### Training Configuration

```yaml
# config.yaml
model:
  name: "vit_base"
  img_size: 224
  patch_size: 16
  num_classes: 1000

training:
  epochs: 10
  batch_size: 32
  learning_rate: 1e-4
  weight_decay: 0.01
  warmup_steps: 1000

lora:
  rank: 16
  alpha: 32
  dropout: 0.1
```

---

## 📚 Documentation & Resources

- 📖 [Complete API Reference](docs/api/index.md)
- 🎓 [Tutorials and Examples](docs/tutorials/index.md)
- 🔬 [Research Papers](#research-papers)
- 💡 [Best Practices Guide](docs/best_practices.md)
- 🐛 [Troubleshooting](docs/troubleshooting.md)

### Research Papers
- [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685)
- [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929)
- [Vision Transformer for Fine-Grained Image Classification](https://arxiv.org/abs/2103.07579)

---

## 🧪 Testing & Quality

Run the comprehensive test suite:

```bash
# Unit tests
pytest tests/unit/

# Integration tests  
pytest tests/integration/

# Performance benchmarks
pytest tests/benchmarks/

# All tests with coverage
pytest tests/ --cov=plimai --cov-report=html
```

### Code Quality Tools

```bash
# Linting
flake8 src/
black src/ --check

# Type checking
mypy src/

# Security scanning
bandit -r src/
```

---

## 🚀 Examples & Use Cases

### Image Classification
```python
from plimai import VisionTransformer
from plimai.datasets import CIFAR10Dataset

# Load pre-trained model
model = VisionTransformer.from_pretrained("vit_base_patch16_224")

# Fine-tune on CIFAR-10
dataset = CIFAR10Dataset(train=True, transform=model.default_transform)
model.finetune(dataset, epochs=10, lora_rank=16)
```

### Custom Dataset
```python
from plimai.datasets import ImageFolderDataset

# Your custom dataset
dataset = ImageFolderDataset(
    root="/path/to/dataset",
    split="train",
    transform=model.default_transform
)

# Fine-tune with custom configuration
model.finetune(
    dataset, 
    config_path="configs/custom_config.yaml"
)
```

---

## 🧩 Extending the Framework
- Add new datasets in `src/plimai/data/datasets.py`
- Add new callbacks in `src/plimai/callbacks/`
- Add new models in `src/plimai/models/`
- Add new CLI tools in `src/plimai/cli/`

## 📖 Documentation
- See code comments and docstrings for details on each module.
- For advanced usage, see the `src/plimai/cli/finetune.py` script.

## 🤝 Contributing
We welcome contributions from the community! Here's how you can get involved:

### Ways to Contribute
- 🐛 **Report bugs** by opening issues with detailed reproduction steps
- 💡 **Suggest features** through feature requests and discussions
- 📝 **Improve documentation** with examples, tutorials, and API docs
- 🔧 **Submit pull requests** for bug fixes and new features
- 🧪 **Add tests** to improve code coverage and reliability

### Development Setup
```bash
# Clone and setup development environment
git clone https://github.com/plim-ai/plim.git
cd plim
pip install -e ".[dev]"

# Install pre-commit hooks
pre-commit install

# Run tests
pytest tests/
```

### Community Resources
- 💬 [GitHub Discussions](https://github.com/plim-ai/plim/discussions) - Ask questions and share ideas
- 🐛 [Issue Tracker](https://github.com/plim-ai/plim/issues) - Report bugs and request features
- 📖 [Contributing Guide](CONTRIBUTING.md) - Detailed contribution guidelines
- 🎯 [Roadmap](ROADMAP.md) - See what's planned for future releases

## 📄 License & Citation

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

### Citation

If you use plimai in your research, please cite:

```bibtex
@software{plimai2025,
  author = {Pritesh Raj},
  title = {plimai: Vision LLMs with Efficient LoRA Fine-Tuning},
  url = {https://github.com/plim-ai/plim},
  year = {2025},
  version = {1.0.0}
}
```

## 🌟 Acknowledgements

We thank the following projects and communities:

- [PyTorch](https://pytorch.org/) - Deep learning framework
- [HuggingFace](https://huggingface.co/) - Transformers and model hub
- [timm](https://github.com/rwightman/pytorch-image-models) - Vision model implementations
- [PEFT](https://github.com/huggingface/peft) - Parameter-efficient fine-tuning methods

## ❓ FAQ

<details>
<summary><b>Q: I get a CUDA out of memory error during training!</b></summary>

**A:** Try these solutions in order:
- Reduce batch size: `--batch_size 16` or `--batch_size 8`
- Enable gradient checkpointing: `--gradient_checkpointing`
- Use a smaller model: `--model vit_small` instead of `vit_base`
- Reduce LoRA rank: `--lora_rank 8` instead of `16`
</details>

<details>
<summary><b>Q: How do I add my own dataset?</b></summary>

**A:** Create a custom dataset class:
```python
from plimai.datasets import BaseDataset

class MyDataset(BaseDataset):
    def __init__(self, root, transform=None):
        super().__init__(root, transform)
        # Your dataset initialization
    
    def __getitem__(self, idx):
        # Return (image, label) tuple
        pass
```
</details>

<details>
<summary><b>Q: Can I use plimai with other vision architectures?</b></summary>

**A:** Currently plimai focuses on Vision Transformers, but we're working on support for:
- ConvNeXT
- Swin Transformer  
- EfficientNet
- ResNet with LoRA

Check our [roadmap](ROADMAP.md) for updates!
</details>

<details>
<summary><b>Q: How do I merge LoRA weights for inference?</b></summary>

**A:** Use the merge functionality:
```python
# Merge LoRA weights into base model
model.merge_lora_weights()

# Save merged model
model.save_pretrained("path/to/merged/model")
```
</details>

## 📄 License & Citation

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

### Citation

If you use plimai in your research, please cite:

```bibtex
@software{plimai2025,
  author = {Pritesh Raj},
  title = {plimai: Vision LLMs with Efficient LoRA Fine-Tuning},
  url = {https://github.com/plim-ai/plim},
  year = {2025},
  version = {1.0.0}
}
```

## 🌟 Acknowledgements

We thank the following projects and communities:

- [PyTorch](https://pytorch.org/) - Deep learning framework
- [HuggingFace](https://huggingface.co/) - Transformers and model hub
- [timm](https://github.com/rwightman/pytorch-image-models) - Vision model implementations
- [PEFT](https://github.com/huggingface/peft) - Parameter-efficient fine-tuning methods

<p align="center">
  <b>Made in India 🇮🇳 with ❤️ by the plimai</b><br/>
  <i>Star ⭐ this repo if you find it useful!</i>
</p>