Metadata-Version: 2.4
Name: otat
Version: 0.1.3
Summary: Vision-Language Model Interpretability Analysis - One Token at a Time
Author-email: Your Name <your.email@example.com>
License: MIT
Project-URL: Homepage, https://github.com/varungupta31/otat_api
Project-URL: Repository, https://github.com/varungupta31/otat_api
Project-URL: Documentation, https://github.com/varungupta31/otat_api#readme
Keywords: interpretability,vision-language-models,attention,llava,qwen
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: torch>=2.0.0
Requires-Dist: torchvision
Requires-Dist: transformers==4.57.0
Requires-Dist: accelerate
Requires-Dist: pandas
Requires-Dist: numpy
Requires-Dist: pillow
Requires-Dist: pyyaml
Requires-Dist: tqdm
Requires-Dist: qwen_vl_utils
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: isort>=5.12.0; extra == "dev"
Requires-Dist: flake8>=6.0.0; extra == "dev"
Provides-Extra: api
Requires-Dist: fastapi>=0.104.0; extra == "api"
Requires-Dist: uvicorn>=0.24.0; extra == "api"
Dynamic: requires-python

# OTaT: One Token at a Time
Vision-Language Model Interpretability Analysis toolkit for analyzing attention patterns in models like LLaVA and Qwen-VL.

## Installation

### From PyPI
```bash
pip install otat
```

### From GitHub
```bash
pip install git@github.com:varungupta31/otat_api.git
```

### Local Development
```bash
git clone https://github.com/varungupta31/otat_api.git
cd otat_api
pip install -e .
```

## Quick Start
```python
from interpretability.api.wrapper import InterpretabilityAnalyzer

# Initialize analyzer for LLaVA OneVision 0.5B
analyzer = InterpretabilityAnalyzer(
    model_type="llava_onevision",
    model_id="llava-hf/llava-onevision-qwen2-0.5b-ov-hf",
    device='cuda:1' #or 'auto' or whichever device you wish to load the model on.
)

# Initialize analyzer for LLaVA OneVision 7B
analyzer = InterpretabilityAnalyzer(
    model_type="llava_onevision",
    model_id="llava-hf/llava-onevision-qwen2-7b-ov-hf",
    device='auto'
)

# Initialize analyzer for Qwen2.5-VL 3B
analyzer = InterpretabilityAnalyzer(
    model_type="qwen_25_vl",
    model_id="Qwen/Qwen2.5-VL-3B-Instruct"
    device='auto'
)

# Initialize analyzer for Qwen2.5-VL 7B
analyzer = InterpretabilityAnalyzer(
    model_type="qwen_25_vl",
    model_id="Qwen/Qwen2.5-VL-7B-Instruct"
    device='auto'
)

# Run analysis
result = analyzer.analyze(
    image_path="path/to/image.jpg",
    task_text="What is in this image?",
    instruction="Answer briefly.",
    blocking_mode="none",
    num_tokens=25
)

print(result['output_tokens'])
print(result['series'])  # Attention patterns
```

## Features

- 🔍 Attention pattern analysis for VLMs
- 🎯 Support for LLaVA, Qwen-VL, and Qwen2-LLM
- 🚫 Attention blocking experiments
- 📊 Token-level attention aggregation
