Metadata-Version: 2.4
Name: vision-agents-plugins-huggingface
Version: 0.3.7
Summary: HuggingFace Inference integration for Vision Agents
Project-URL: Documentation, https://visionagents.ai/
Project-URL: Website, https://visionagents.ai/
Project-URL: Source, https://github.com/GetStream/Vision-Agents
License-Expression: MIT
Keywords: AI,LLM,agents,huggingface,inference,voice agents
Requires-Python: >=3.10
Requires-Dist: huggingface-hub<1.0
Requires-Dist: vision-agents
Provides-Extra: transformers
Requires-Dist: accelerate>=0.25.0; extra == 'transformers'
Requires-Dist: torch>=2.0.0; extra == 'transformers'
Requires-Dist: transformers>=4.45.0; extra == 'transformers'
Provides-Extra: transformers-quantized
Requires-Dist: accelerate>=0.25.0; extra == 'transformers-quantized'
Requires-Dist: bitsandbytes>=0.41.0; extra == 'transformers-quantized'
Requires-Dist: torch>=2.0.0; extra == 'transformers-quantized'
Requires-Dist: transformers>=4.45.0; extra == 'transformers-quantized'
Description-Content-Type: text/markdown

# HuggingFace Plugin for Vision Agents

HuggingFace Inference integration for Vision Agents. Supports both text-only LLM and vision language models (VLM) through HuggingFace's Inference Providers API.

## Installation

```bash
uv add vision-agents[huggingface]
```

## Configuration

Set your HuggingFace API token:

```bash
export HF_TOKEN=your_huggingface_token
```

## Usage

### Text-only LLM

```python
from vision_agents.plugins import huggingface

llm = huggingface.LLM(
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    provider="together",  # optional: use "auto" or omit to let HuggingFace auto-select based on your settings
)

response = await llm.simple_response("Hello, how are you?")
print(response.text)
```

### Vision Language Model (VLM)

```python
from vision_agents.plugins import huggingface

vlm = huggingface.VLM(
    model="Qwen/Qwen2-VL-7B-Instruct",
    fps=1,
    frame_buffer_seconds=10,
)

# VLM automatically buffers video frames when used with an Agent
response = await vlm.simple_response("What do you see?")
print(response.text)
```

### With Function Calling

```python
from vision_agents.plugins import huggingface

llm = huggingface.LLM(model="meta-llama/Meta-Llama-3-8B-Instruct")

@llm.register_function()
def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    return f"The weather in {city} is sunny."

response = await llm.simple_response("What's the weather in Paris?")
```

## Supported Providers

HuggingFace's Inference Providers API supports multiple backends:

- Together AI
- Groq
- Cerebras
- Replicate
- Fireworks
- And more

Specify a provider explicitly or let HuggingFace auto-select:

```python
llm = huggingface.LLM(
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    provider="groq",
)
```

## API Reference

### `huggingface.LLM`

Text-only language model integration.

**Parameters:**
- `model` (str): HuggingFace model ID
- `api_key` (str, optional): HuggingFace API token (defaults to `HF_TOKEN` env var)
- `provider` (str, optional): Inference provider name

### `huggingface.VLM`

Vision language model integration with video frame buffering.

**Parameters:**
- `model` (str): HuggingFace model ID
- `api_key` (str, optional): HuggingFace API token (defaults to `HF_TOKEN` env var)
- `provider` (str, optional): Inference provider name
- `fps` (int): Frames per second to buffer (default: 1)
- `frame_buffer_seconds` (int): Seconds of video to buffer (default: 10)
