Metadata-Version: 2.4
Name: video-analyzer-tune
Version: 0.1.0
Summary: DSPy-based prompt optimizer for video-analyzer
Author: Jesse White
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Multimedia :: Video
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: video-analyzer>=0.1.2
Requires-Dist: dspy-ai>=2.6.0
Requires-Dist: Pillow>=10.0.0
Dynamic: author
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# video-analyzer-tune

DSPy-based prompt optimizer for [video-analyzer](https://github.com/byjlw/video-analyzer).

Automatically improves the two prompts that `video-analyzer` uses — the per-frame analysis prompt and the final video reconstruction prompt — based on examples of what good output looks like for your specific content and use case.

## Overview

`video-analyzer` works in two stages: it analyzes each video frame individually (building up a running log of observations), then synthesizes all the frame notes into a final video description. Both stages are driven by prompt files that you can customize.

`video-analyzer-tune` uses [DSPy MIPROv2](https://dspy.ai) to optimize both prompts end-to-end. You provide a few examples of what ideal output looks like — both at the frame level and the final description level — and the tuner finds better prompt instructions automatically.

The main `video-analyzer` package is not affected in any way. Tuned prompts are written as new files that you point to via your config.

## Requirements

- Python 3.8+
- `video-analyzer >= 0.1.1`
- An Ollama instance with a vision model, or an OpenAI-compatible API

## Installation

```bash
pip install video-analyzer-tune
```

## Quick Start

### Step 1 — Generate output with frames kept

Run `video-analyzer` on a representative video and keep the extracted frames:

```bash
video-analyzer my_video.mp4 --keep-frames
```

This produces an `output/` directory containing:
- `analysis.json` — frame-by-frame notes and the final description
- `frames/` — the extracted frame images

### Step 2 — Edit analysis.json with your ideal output

Open `output/analysis.json` and edit two things:

**Required:** Edit `video_description.response` to show what the ideal final description looks like for your use case.

**Recommended:** Edit each `frame_analyses[i].response` to show what ideal frame notes look like. This gives the optimizer a signal at both stages of the pipeline and produces better results.

```json
{
  "frame_analyses": [
    {
      "frame": 0,
      "timestamp": 0.0,
      "response": "Your ideal frame note here — what details matter for your use case"
    }
  ],
  "video_description": {
    "response": "Your ideal final description here — the style, length, and focus you want"
  }
}
```

The more videos you edit and include as training examples, the better the results.

### Step 3 — Create training_data.json

```json
{
  "examples": [
    { "output_dir": "output" }
  ]
}
```

Add one entry per video you edited:

```json
{
  "examples": [
    { "output_dir": "output/video1" },
    { "output_dir": "output/video2" },
    { "output_dir": "output/video3" }
  ]
}
```

### Step 4 — Run the tuner

```bash
video-analyzer-tune --training-data training_data.json --output-dir tuned_prompts/
```

This runs MIPROv2 optimization, which will take some time depending on `--num-candidates` and `--num-trials`.

### Step 5 — Update your config

When tuning completes, the tool prints a config snippet to paste into your `config/config.json`:

```json
"prompt_dir": "tuned_prompts",
"prompts": [
  {"name": "Frame Analysis", "path": "frame_analysis_tuned.txt"},
  {"name": "Video Reconstruction", "path": "describe_tuned.txt"}
]
```

Run `video-analyzer` as normal — it will use your tuned prompts automatically.

## Training Data Format

### training_data.json

```json
{
  "examples": [
    { "output_dir": "path/to/output" }
  ]
}
```

Paths can be absolute or relative to the location of `training_data.json`.

### What to edit in analysis.json

| Field | Required | Description |
|---|---|---|
| `video_description.response` | Yes | Your ideal final video description |
| `frame_analyses[i].response` | Recommended | Your ideal frame note for each frame |
| `prompt` | No | Leave as-is |
| `transcript` | No | Leave as-is |

## CLI Reference

| Flag | Default | Description |
|---|---|---|
| `--training-data` | required | Path to training_data.json |
| `--output-dir` | `tuned_prompts` | Directory to write tuned prompt files |
| `--client` | `ollama` | LLM client: `ollama` or `openai_api` |
| `--model` | `llama3.2-vision` | Vision model to use for optimization runs |
| `--ollama-url` | `http://localhost:11434` | Ollama server URL |
| `--api-key` | — | API key (required when `--client openai_api`) |
| `--api-url` | — | API endpoint URL (required when `--client openai_api`) |
| `--num-candidates` | `10` | Number of prompt variations generated per module. Higher = more thorough but slower. Range: 5–20 |
| `--num-trials` | `20` | Number of optimization trials. Higher = better results but slower. Range: 10–50 |
| `--max-bootstrapped-demos` | `3` | Max few-shot examples generated by bootstrapping |
| `--max-labeled-demos` | `4` | Max few-shot examples taken from your training data |
| `--description-weight` | `0.7` | How much the final description quality influences the score (0.0–1.0). The remainder weights frame analysis quality. Use `0.5` if you care equally about both; use `1.0` to optimize only for the final description |
| `--log-level` | `INFO` | Logging level: DEBUG / INFO / WARNING / ERROR |

## LLM Configuration

### Using Ollama (default)

```bash
video-analyzer-tune \
  --training-data training_data.json \
  --output-dir tuned_prompts/ \
  --model llama3.2-vision
```

### Using an OpenAI-compatible API (e.g. OpenRouter)

```bash
video-analyzer-tune \
  --training-data training_data.json \
  --output-dir tuned_prompts/ \
  --client openai_api \
  --model meta-llama/llama-3.2-11b-vision-instruct \
  --api-url https://openrouter.ai/api/v1 \
  --api-key YOUR_API_KEY
```

## How It Works

`video-analyzer` uses two prompt files:

1. **`frame_analysis.txt`** — called once per frame with the image and all previous frame notes. Produces the per-frame observation log.
2. **`describe.txt`** — called once at the end with all frame notes and the audio transcript. Produces the final video description.

`video-analyzer-tune` wraps both prompts in a DSPy pipeline that mirrors the exact processing logic of `video-analyzer`. It then runs [MIPROv2](https://dspy.ai/learn/optimization/optimizers/) — a Bayesian optimizer that generates candidate instruction variations and scores them against your training examples.

Scoring uses an LLM-as-judge approach: the same model evaluates how well the generated output matches your ideal examples on a 1–5 scale. Frame note quality and final description quality are combined using the configurable `--description-weight`.

After optimization, the improved instruction text is written into new `.txt` files that preserve all the `{TOKEN}` placeholders (`{PREVIOUS_FRAMES}`, `{FRAME_NOTES}`, etc.) that `video-analyzer` uses for its string replacement — making the output files drop-in compatible.

## Tips for Better Results

- **Use multiple videos.** Even 3–5 diverse examples significantly improves optimization quality.
- **Edit frame notes too.** If you only edit the final description, the optimizer has less signal about what good intermediate analysis looks like.
- **Be specific in your edits.** The more clearly your ideal examples demonstrate the style and focus you want, the better the optimizer can learn from them.
- **Use the same model for tuning as for inference.** The optimized prompts are tuned to the specific model's behavior.
- **Increase `--num-candidates` and `--num-trials`** for better results if you have the time. Start with defaults and increase from there.
- **Use `--description-weight 0.5`** if you read the frame notes directly and care as much about their quality as the final description.

## License

Apache License 2.0 — same as [video-analyzer](https://github.com/byjlw/video-analyzer).
