Metadata-Version: 2.4
Name: langvision
Version: 0.1.37
Summary: Efficient LoRA Fine-Tuning for Vision LLMs with advanced CLI and model zoo
Author-email: Pritesh Raj <priteshraj10@gmail.com>
Maintainer-email: Pritesh Raj <priteshraj10@gmail.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/langtrain-ai/langvision
Project-URL: Documentation, https://github.com/langtrain-ai/langvision/tree/main/docs
Project-URL: Repository, https://github.com/langtrain-ai/langvision
Project-URL: Bug Tracker, https://github.com/langtrain-ai/langvision/issues
Project-URL: Source Code, https://github.com/langtrain-ai/langvision
Project-URL: Changelog, https://github.com/langtrain-ai/langvision/blob/main/CHANGELOG.md
Keywords: vision,transformer,lora,fine-tuning,deep-learning,computer-vision
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Image Processing
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: torch>=1.10.0
Requires-Dist: torchvision>=0.11.0
Requires-Dist: numpy>=1.21.0
Requires-Dist: tqdm>=4.62.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: scipy>=1.7.0
Requires-Dist: matplotlib>=3.5.0
Requires-Dist: pillow>=8.3.0
Requires-Dist: timm>=0.6.0
Requires-Dist: transformers>=4.20.0
Requires-Dist: toml>=0.10.0
Requires-Dist: scikit-learn>=1.0.0
Requires-Dist: pandas>=1.3.0
Requires-Dist: opencv-python-headless>=4.5.0
Requires-Dist: wandb>=0.13.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: pytest-mock>=3.8.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: isort>=5.10.0; extra == "dev"
Requires-Dist: flake8>=5.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: pre-commit>=2.20.0; extra == "dev"
Requires-Dist: bandit>=1.7.0; extra == "dev"
Provides-Extra: docs
Requires-Dist: sphinx>=5.0.0; extra == "docs"
Requires-Dist: sphinx-rtd-theme>=1.0.0; extra == "docs"
Requires-Dist: myst-parser>=0.18.0; extra == "docs"
Requires-Dist: sphinx-autodoc-typehints>=1.19.0; extra == "docs"
Provides-Extra: examples
Requires-Dist: jupyter>=1.0.0; extra == "examples"
Requires-Dist: ipywidgets>=7.6.0; extra == "examples"
Requires-Dist: tensorboard>=2.9.0; extra == "examples"
Requires-Dist: wandb>=0.13.0; extra == "examples"
Provides-Extra: gpu
Requires-Dist: torch>=1.10.0; extra == "gpu"
Requires-Dist: torchvision>=0.11.0; extra == "gpu"

# Langvision: LoRA Fine-Tuning for Vision LLMs

<div align="center">

<picture>
  <source media="(prefers-color-scheme: dark)" srcset="static/langvision-white.png">
  <img alt="Langvision Logo" src="static/langvision-black.png" width="400">
</picture>

### Fine-tune Vision LLMs (LLaVA, Qwen-VL) in minutes

[![PyPI version](https://img.shields.io/pypi/v/langvision.svg)](https://pypi.org/project/langvision/)
[![Downloads](https://pepy.tech/badge/langvision)](https://pepy.tech/project/langvision)
[![License](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![Python](https://img.shields.io/badge/python-3.8%2B-blue)]()

</div>

---

## What You'll Need

```bash
# Quick system check
python --version 

# Check GPU support (Optional but recommended)
python -c "import torch; print('GPU ready!' if torch.cuda.is_available() else 'CPU mode - still works!')"
```

## Install LangTrain

```bash
# Step 1: Create a clean environment (recommended)
python -m venv langtrain-env
source langtrain-env/bin/activate  # Windows: langtrain-env\Scripts\activate

# Step 2: Install LangVision
pip install langvision

# Step 3: Verify it worked
python -c "import langvision; print('✅ LangVision installed!')"
```

## Train Your First Model

```python
from langvision import LoRATrainer

# Step 1: Define your training data (Images + QA)
training_data = [
    {
        "image": "./images/cat.jpg", 
        "question": "What is in this image?", 
        "answer": "A cute tabby cat sitting on a rug."
    },
    {
        "image": "./images/dog.jpg", 
        "question": "Describe the animal.", 
        "answer": "A golden retriever playing with a ball."
    }
]

# Step 2: Create the trainer
# Configures Vision Encoder + LLM Adapter automatically
trainer = LoRATrainer(
    model_name="llava-v1.6-7b",  # Works with LLaVA, Qwen-VL, BLIP-2 etc.
    output_dir="./my_vision_model",
)

# Step 3: Train!
trainer.train(training_data)

# Step 4: Test your model
model = trainer.load_model()
response = model.chat("./images/cat.jpg", "What do you see?")
print(f"AI: {response}")
```

## Use Your Trained Model

```python
from langvision import ChatModel

# Load your trained model
model = ChatModel.load("./my_vision_model")

# Analyze images
print(model.chat("image1.jpg", "Describe this scene."))
```

## Using Your Own Data

```python
from langvision import LoRATrainer

trainer = LoRATrainer(
    model_name="llava-v1.6-7b",
    output_dir="./custom_vlm",
)

# Method 1: Load from Hugging Face datasets
trainer.train_from_hub("your_username/your_vqa_dataset")
```

## Next Steps

1.  **Train with QLoRA**: Use `QLoRATrainer` to fine-tune LLaVA-7B on consumer GPUs (under 12GB VRAM).
2.  **Explore Model Zoo**: `langvision model-zoo list` to see supported models (LLaVA, Qwen, CogVLM, etc.).
3.  **Read the Docs**: Check out [langtrain.xyz/docs](https://langtrain.xyz/docs).

---

## Architecture Overview

Langvision adapts Vision Transformers (ViT) and Large Language Models (LLM) using LoRA.

```mermaid
flowchart TD
    A(["Input Image"]) --> B(["Vision Encoder (Frozen)"])
    B --> C(["Projector"])
    C --> D(["LLM (LoRA Adapted)"])
    D --> E(["Text Output"])
```

## Contributing

Contributions are welcome! See [CONTRIBUTING.md](CONTRIBUTING.md).

## License

MIT License. See [LICENSE](LICENSE).

## Citation

```bibtex
@software{langvision2025,
  author = {Pritesh Raj},
  title = {Langvision: Efficient LoRA Fine-Tuning for Vision LLMs},
  url = {https://github.com/langtrain-ai/langvision},
  year = {2025}
}
```
