Metadata-Version: 2.4
Name: anchor-vision
Version: 0.1.0
Summary: Python client for Anchor — PaliGemma2 multi-LoRA vision inference
License: Apache License
        Version 2.0, January 2004
        http://www.apache.org/licenses/
        
        Copyright 2026 Recursia Lab
        
        Licensed under the Apache License, Version 2.0 (the "License");
        you may not use this file except in compliance with the License.
        You may obtain a copy of the License at
        
            http://www.apache.org/licenses/LICENSE-2.0
        
        Unless required by applicable law or agreed to in writing, software
        distributed under the License is distributed on an "AS IS" BASIS,
        WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        See the License for the specific language governing permissions and
        limitations under the License.
        
Project-URL: Homepage, https://github.com/recursia-lab/anchor
Project-URL: Repository, https://github.com/recursia-lab/anchor
Project-URL: Bug Tracker, https://github.com/recursia-lab/anchor/issues
Keywords: paligemma,vision,lora,multimodal,langchain,industrial
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: langchain
Requires-Dist: langchain-core>=0.1.0; extra == "langchain"
Requires-Dist: pydantic>=2.0; extra == "langchain"
Provides-Extra: all
Requires-Dist: langchain-core>=0.1.0; extra == "all"
Requires-Dist: pydantic>=2.0; extra == "all"
Dynamic: license-file

# Anchor

**PaliGemma2 multi-LoRA serving with OpenAI-compatible API.**

Load multiple LoRA adapters once. Switch between them at inference time — 216ms, no reload.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/recursia-lab/anchor/blob/main/examples/anchor_quickstart.ipynb)
[![PyPI](https://img.shields.io/pypi/v/anchor-vision)](https://pypi.org/project/anchor-vision/)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue)](LICENSE)

```
                    ┌─────────────────────────────────┐
  Request           │           Anchor                │
  model="short" ───▶│                                 │
                    │  PaliGemma2 base  (VRAM)        │
                    │  ├── adapter: missing_hole  ◀─  │──▶ "YES / NO"
                    │  ├── adapter: open_circuit  ◀─  │
                    │  ├── adapter: short  ◀──────── ─│  pointer swap
                    │  ├── adapter: mouse_bite    ◀─  │     216ms
                    │  └── adapter: spur          ◀─  │
                    └─────────────────────────────────┘
```

```bash
# Call the open_circuit adapter
curl https://your-anchor-endpoint/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "open_circuit",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}},
        {"type": "text", "text": "Does this PCB have an open circuit defect? Answer YES or NO."}
      ]
    }],
    "max_tokens": 3
  }'
```

## Python Client

```bash
pip install anchor-vision
```

```python
from anchor_vision import AnchorClient

client = AnchorClient("https://your-anchor.run.app")
result = client.inspect("image.jpg", adapter="open_circuit")
print(result.answer)      # "YES"
print(result.latency_ms)  # 216
```

## Quick Demo

```bash
# 1. Clone and build
git clone https://github.com/recursia-lab/anchor
docker build -t anchor .

# 2. Run (mount your model and adapters)
docker run --gpus all \
  -v /path/to/paligemma2:/model \
  -v /path/to/lora:/lora \
  -p 8080:8080 anchor

# 3. Query any adapter by name
curl http://localhost:8080/v1/chat/completions \
  -d '{"model":"open_circuit","messages":[{"role":"user","content":[
    {"type":"image_url","image_url":{"url":"data:image/jpeg;base64,<b64>"}},
    {"type":"text","text":"Defect present? YES or NO."}
  ]}],"max_tokens":3}'
# → {"choices":[{"message":{"content":"YES"}}],"usage":{"latency_ms":216}}
```

## Why Anchor

Most serving frameworks load LoRA adapters **per request** — fetching from disk or
swapping from CPU at inference time. For production workloads where multiple
fine-tuned adapters are in active use, this adds hundreds of milliseconds per request.

Anchor takes a different approach: **all adapters live in GPU memory simultaneously**.
Switching is a pointer swap — 216ms, no disk I/O, no model reload.

| Framework | PaliGemma2 LoRA | Multi-adapter | Dynamic switch |
|---|---|---|---|
| **Anchor** | ✅ | ✅ all in VRAM | ✅ 216ms |
| vLLM | ✅ (since v0.7.0) | ✅ | per-request load |
| SGLang | 🚧 [PR #24034](https://github.com/sgl-project/sglang/pull/24034) | — | — |
| Unsloth | 🚧 [PR #5218](https://github.com/unslothai/unsloth/pull/5218) | — | fine-tune only |
| Ollama | ❌ | — | — |
| TGI / LoRAX | ❌ | — | — |

**When to use Anchor:** production scenarios with 2–10 adapters that all need
low-latency access. When one adapter is enough, vLLM works fine.

## Architecture

```
/model          ← PaliGemma2 base (bfloat16, device_map=auto)
/lora/
  adapter_1/    ← PEFT LoRA adapter (loaded via load_adapter)
  adapter_2/
  adapter_3/

Request: model="adapter_1"  →  set_adapter("adapter_1")  →  generate()  →  216ms
Request: model="adapter_2"  →  set_adapter("adapter_2")  →  generate()  →  216ms
Request: model="base"       →  disable_adapters()         →  generate()
```

All adapters stay in VRAM. Switching is just a pointer swap — no disk I/O, no model reload.

## Quick Start

### Python (pip)

```bash
pip install anchor-vision
```

```python
from anchor_vision import AnchorClient

client = AnchorClient("https://your-anchor.run.app")

# List loaded adapters
print(client.list_adapters())  # ["open_circuit", "short", "mouse_bite", ...]

# Run inference
result = client.inspect(
    "image.jpg",
    adapter="open_circuit",
    prompt="Is there an open circuit defect? Answer YES or NO.",
)
print(result)  # "YES"
```

### LangChain

```bash
pip install 'anchor-vision[langchain]'
```

```python
from anchor_vision import AnchorVisionTool

tool = AnchorVisionTool(
    endpoint="https://your-anchor.run.app",
    adapter="open_circuit",
    prompt="Is there a defect? Answer YES or NO.",
)

result = tool.invoke({"image_path": "image.jpg"})
# → "YES"

# Drop into any LangChain agent
# agent = initialize_agent(tools=[tool], ...)
```

### Local (GPU required)

```bash
# 1. Clone
git clone https://github.com/recursia-lab/anchor
cd anchor

# 2. Install
pip install -r requirements.txt

# 3. Place model and adapters
#    /model   → PaliGemma2 weights (from HuggingFace or your fine-tune)
#    /lora/   → one subfolder per adapter

MODEL_PATH=/path/to/model LORA_PATH=/path/to/lora python server.py
```

### Docker

```bash
docker build -t anchor .
docker run --gpus all \
  -v /path/to/model:/model \
  -v /path/to/lora:/lora \
  -p 8080:8080 \
  anchor
```

### Google Cloud Run (GPU)

```bash
# Edit cloudbuild.yaml substitutions, then:
gcloud builds submit --config cloudbuild.yaml

gcloud beta run deploy anchor \
  --image YOUR_IMAGE \
  --region us-east4 \
  --gpu=1 --gpu-type=nvidia-l4 \
  --cpu=8 --memory=32Gi \
  --no-cpu-throttling \
  --no-gpu-zonal-redundancy \
  --min-instances=0 \
  --startup-probe="tcpSocket.port=8080,initialDelaySeconds=240,timeoutSeconds=240,periodSeconds=240,failureThreshold=1"
```

## API

### `GET /health`

```json
{"status": "ok", "adapters": ["open_circuit", "short", "mouse_bite"]}
```

### `GET /v1/models`

Lists all loaded adapters in OpenAI format.

### `POST /v1/chat/completions`

OpenAI-compatible. Use `model` field to select adapter.

**Request:**
```json
{
  "model": "open_circuit",
  "messages": [{
    "role": "user",
    "content": [
      {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,<b64>"}},
      {"type": "text", "text": "<your prompt>"}
    ]
  }],
  "max_tokens": 10
}
```

**Response:**
```json
{
  "model": "open_circuit",
  "choices": [{"message": {"role": "assistant", "content": "YES"}}],
  "usage": {"prompt_tokens": 271, "completion_tokens": 1, "latency_ms": 216}
}
```

## Environment Variables

| Variable | Default | Description |
|---|---|---|
| `MODEL_PATH` | `/model` | Path to PaliGemma2 base model |
| `LORA_PATH` | `/lora` | Directory of LoRA adapter subfolders |
| `PORT` | `8080` | HTTP port |

## Performance (Google Cloud Run, NVIDIA L4)

| Metric | Value |
|---|---|
| Cold start (model load) | ~3 min |
| Adapter switch latency | 216ms |
| Concurrent adapters in VRAM | 6 (tested) |
| GPU memory (6 PCB adapters) | ~12GB / 24GB L4 |

## Ecosystem

- **Python client:** `pip install anchor-vision`
- **Adapters:** [recursia-lab/paligemma2-adapters](https://github.com/recursia-lab/paligemma2-adapters) — community LoRA adapter index
- **SGLang:** [PR #24034](https://github.com/sgl-project/sglang/pull/24034) — native PaliGemma2 LoRA support (pending merge)
- **Unsloth:** [PR #5218](https://github.com/unslothai/unsloth/pull/5218) — PaliGemma2 fine-tuning support (pending merge)
- **vLLM:** supported since v0.7.0

## Roadmap

- [x] PEFT multi-LoRA server (this repo)
- [x] Google Cloud Run deployment
- [x] SGLang PR (#24034)
- [x] Unsloth PR (#5218)
- [x] Python client (`pip install anchor-vision`)
- [x] LangChain integration
- [x] Colab quickstart notebook
- [ ] PyPI publish
- [ ] Ollama support (blocked by llama.cpp SigLIP encoder)
- [ ] AWQ quantization (2-5x speedup)
- [ ] Continuous batching

## About

Built by [Recursia Lab](https://github.com/recursia-lab) for industrial visual inspection.

PaliGemma2 is a vision-language model by Google DeepMind.
