Metadata-Version: 2.4
Name: oceanir
Version: 0.1.0
Summary: Oculus Vision-Language Model - Inference SDK for multimodal AI research
Project-URL: Homepage, https://github.com/OceanirAI/oceanir
Project-URL: Documentation, https://oceanir.ai/docs
Project-URL: Repository, https://github.com/OceanirAI/oceanir
Project-URL: Issues, https://github.com/OceanirAI/oceanir/issues
Author-email: OceanirAI <contact@oceanir.ai>
License: OCEANIR RESEARCH LICENSE
        Version 1.0, January 2026
        
        Copyright (c) 2026 OceanirAI
        
        TERMS AND CONDITIONS
        
        1. DEFINITIONS
        
        "Software" refers to the Oculus model weights, code, and associated materials
        distributed under this license.
        
        "Research Use" means non-commercial academic research, educational purposes,
        and personal experimentation for learning.
        
        "Commercial Use" means any use intended for or directed toward commercial
        advantage or monetary compensation.
        
        2. GRANT OF LICENSE
        
        Subject to the terms of this License, OceanirAI grants you a non-exclusive,
        worldwide, royalty-free license to use, copy, and modify the Software for
        Research Use only.
        
        3. PERMITTED USES
        
        You MAY:
        - Use the Software for academic research
        - Use the Software for educational purposes
        - Publish research papers using results obtained from the Software
        - Modify the Software for Research Use
        - Share modifications under this same license
        - Use the Software in academic courses and tutorials
        
        4. PROHIBITED USES
        
        You MAY NOT:
        - Use the Software for any Commercial Use
        - Sell, license, or sublicense the Software
        - Use the Software to train models for commercial deployment
        - Integrate the Software into commercial products or services
        - Use the Software to provide commercial services
        - Remove or alter any license notices or attributions
        
        5. ATTRIBUTION
        
        Any publication, presentation, or distribution of work using this Software
        must include the following citation:
        
        "Oculus Vision-Language Model, OceanirAI, 2026"
        
        6. NO WARRANTY
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
        OCEANIR AI OR CONTRIBUTORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
        FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
        DEALINGS IN THE SOFTWARE.
        
        7. TERMINATION
        
        This License and the rights granted hereunder will terminate automatically
        upon any breach by you of the terms of this License.
        
        8. COMMERCIAL LICENSING
        
        For commercial licensing inquiries, please contact: licensing@oceanir.ai
        
        9. GOVERNING LAW
        
        This License shall be governed by and construed in accordance with the laws
        of the State of California, United States, without regard to its conflict
        of law provisions.
License-File: LICENSE
Keywords: deep-learning,image-captioning,machine-learning,multimodal,oculus,research,vision-language,vqa
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Image Recognition
Requires-Python: >=3.9
Requires-Dist: huggingface-hub>=0.16.0
Requires-Dist: numpy>=1.21.0
Requires-Dist: pillow>=9.0.0
Requires-Dist: requests>=2.28.0
Requires-Dist: torch>=2.0.0
Requires-Dist: transformers>=4.30.0
Provides-Extra: dev
Requires-Dist: black>=23.0.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: mlx
Requires-Dist: mlx>=0.0.8; extra == 'mlx'
Description-Content-Type: text/markdown

# Oceanir

Oculus Vision-Language Model SDK for multimodal AI research.

## Installation

```bash
pip install oceanir
```

## Quick Start

```python
from oceanir import Oculus

# Load the model
model = Oculus.from_pretrained("OceanirAI/Oculus-0.1-Instruct")

# Visual Question Answering
answer = model.ask("photo.jpg", "What is the person doing?")
print(answer)  # "The person is riding a bicycle."

# Image Captioning
caption = model.caption("photo.jpg")
print(caption)  # "A dog playing in the park with a frisbee."

# Object Detection
results = model.detect("photo.jpg")
for box, label, conf in zip(results['boxes'], results['labels'], results['confidences']):
    print(f"{label}: {conf:.2f}")

# Counting Objects
count = model.count("crowd.jpg", "people")
print(f"Found {count} people")
```

## Models

| Model | Description |
|-------|-------------|
| `OceanirAI/Oculus-0.1-Instruct` | Instruction-tuned for general VQA and captioning |
| `OceanirAI/Oculus-0.1-Reasoning` | Enhanced with chain-of-thought reasoning |

## Reasoning Mode

Enable thinking traces for complex questions:

```python
# With reasoning
answer = model.ask(
    "complex_scene.jpg",
    "How many red cars are parked on the left side?",
    think=True
)
```

## Features

- **Visual Question Answering (VQA)** - Answer questions about images
- **Image Captioning** - Generate natural language descriptions
- **Object Detection** - Detect and localize objects with bounding boxes
- **Object Counting** - Count specific objects in images
- **Semantic Segmentation** - Pixel-level scene understanding
- **Chain-of-Thought Reasoning** - Step-by-step reasoning for complex tasks

## Architecture

Oculus combines:
- **DINOv2** - Self-supervised vision transformer for semantic understanding
- **SigLIP** - Vision-language alignment for text understanding
- **Trained Projector** - Maps vision features to language space
- **BLIP** - Language model for text generation

## License

This software is released under the **Oceanir Research License**.

**Permitted Uses:**
- Academic research
- Educational purposes
- Publishing papers with results
- Personal experimentation

**Prohibited Uses:**
- Commercial applications
- Training commercial models
- Integration into commercial products

For commercial licensing, contact: licensing@oceanir.ai

## Citation

If you use Oceanir in your research, please cite:

```bibtex
@software{oculus2026,
  title={Oculus Vision-Language Model},
  author={OceanirAI},
  year={2026},
  url={https://github.com/OceanirAI/oceanir}
}
```

## Links

- [HuggingFace Models](https://huggingface.co/OceanirAI)
- [Documentation](https://oceanir.ai/docs)
- [GitHub](https://github.com/OceanirAI/oceanir)
