Metadata-Version: 2.4
Name: merleau
Version: 0.5.0
Summary: Video analysis using Google's Gemini 2.5 Flash API
Requires-Python: >=3.10
Requires-Dist: google-genai
Requires-Dist: python-dotenv
Provides-Extra: web
Requires-Dist: streamlit>=1.30.0; extra == 'web'
Description-Content-Type: text/markdown

# Merleau

> *"The world is not what I think, but what I live through."*
> — Maurice Merleau-Ponty

A CLI tool for video understanding using Google's Gemini API. Named after [Maurice Merleau-Ponty](https://en.wikipedia.org/wiki/Maurice_Merleau-Ponty), the phenomenologist philosopher whose work on perception inspires how this tool helps you perceive your videos.

[![PyPI version](https://img.shields.io/pypi/v/merleau)](https://pypi.org/project/merleau/)
[![Python](https://img.shields.io/pypi/pyversions/merleau)](https://pypi.org/project/merleau/)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Streamlit App](https://img.shields.io/badge/Streamlit-App-ff4b4b?logo=streamlit&logoColor=white)](https://merleau.streamlit.app/)
[![Website](https://img.shields.io/badge/Website-GitHub%20Pages-0891b2)](https://yanndebray.github.io/merleau/)

https://github.com/user-attachments/assets/e2c5b476-ddab-49ab-a35c-9ae5e880c25c

## Why Merleau?

Google Gemini is the **only major AI provider** with native video understanding—Claude doesn't support video, and GPT-4o requires frame extraction workarounds. Merleau is the first CLI that actually understands video rather than analyzing frames.

## Features

- **Native Gemini video processing** - Upload and analyze videos directly
- **YouTube URL support** - Analyze videos directly from YouTube (free preview)
- **Customizable prompts** - Ask any question about your video
- **Cost estimation** - Token usage tracking and cost breakdown
- **Multiple models** - Support for different Gemini models
- **Web UI** - Streamlit app for browser-based analysis

## Use cases

### Clone apps

Take a screencast of your app and ask:
- "What are the main features of this app?"
- "What are the main UI elements?"
- "What are the main user flows?"

### Extract code from a coding screencast

```bash
ponty https://www.youtube.com/watch?v=Be0ceKN81S8 -p "Extract the text in the first claude code session" -e md
```

![playwright-cli-claude-code](img/playwright-cli-claude-code.png)

## Installation

Using [uv](https://docs.astral.sh/uv/) (recommended):
```bash
uv sync
```

Or install from PyPI:
```bash
pip install merleau
```

## Configuration

1. Get a Gemini API key from [Google AI Studio](https://aistudio.google.com/apikey)
2. Set the API key as an environment variable or create a `.env` file:
   ```
   GEMINI_API_KEY=your_api_key_here
   ```

## Usage

```bash
# Basic video analysis
ponty video.mp4

# Analyze a YouTube video directly
ponty https://youtu.be/VIDEO_ID
ponty https://www.youtube.com/watch?v=VIDEO_ID

# Custom prompt
ponty video.mp4 -p "Summarize the key points in this video"

# Use a different model
ponty video.mp4 -m gemini-2.0-flash

# Export analysis to markdown
ponty video.mp4 -e md

# Hide cost information
ponty video.mp4 --no-cost
```

### Web UI

**Try it online:** https://merleau.streamlit.app/

The web app supports both file uploads and YouTube URLs (paste a URL in the YouTube tab to preview and analyze directly).

Or run locally:
```bash
pip install merleau[web]
streamlit run streamlit_app.py
```

### Options

| Option | Description |
|--------|-------------|
| `-p, --prompt` | Prompt for the analysis (default: "Explain what happens in this video") |
| `-m, --model` | Gemini model to use (default: gemini-2.5-flash) |
| `-e, --export` | Export analysis to file (supported formats: md) |
| `--no-cost` | Hide usage and cost information |
| `-V, --version` | Show version and exit |

## Reducing Costs with Compression

Compressing videos before analysis can reduce API costs by ~10-15% without degrading analysis quality. Gemini's token count is affected by video resolution and bitrate.

### Quick Compression with ffmpeg

```bash
# Basic compression (recommended)
ffmpeg -i input.mp4 -vcodec libx264 -crf 28 -preset medium -vf "scale=1280:-2" output.mp4

# Aggressive compression (smaller file, lower quality)
ffmpeg -i input.mp4 -vcodec libx264 -crf 32 -preset medium -vf "scale=640:-2" output.mp4

# Keep audio (for speech analysis)
ffmpeg -i input.mp4 -vcodec libx264 -crf 28 -preset medium -vf "scale=1280:-2" -acodec aac -b:a 128k output.mp4
```

### Compression Options Explained

| Option | Description |
|--------|-------------|
| `-crf 28` | Quality level (18-28 recommended, higher = smaller file) |
| `-preset medium` | Encoding speed/quality tradeoff |
| `-vf "scale=1280:-2"` | Resize to 1280px width, maintain aspect ratio |
| `-an` | Remove audio (if not needed) |
| `-acodec aac -b:a 128k` | Compress audio to 128kbps AAC |

### Cost Comparison Example

| Version | File Size | Prompt Tokens | Input Cost |
|---------|-----------|---------------|------------|
| Original (1080p) | 52 MB | 14,757 | $0.00221 |
| Compressed (720p) | 2.6 MB | 13,157 | $0.00197 |
| **Savings** | **95%** | **10.8%** | **10.8%** |

## Output

The CLI provides:
- Video content analysis from Gemini
- Token usage breakdown (prompt, response, total)
- Estimated cost based on Gemini pricing

## Pricing Reference

Gemini 2.5 Flash (as of 2025):
- Input: $0.15 per 1M tokens (text/image), $0.075 per 1M tokens (video)
- Output: $0.60 per 1M tokens, $3.50 for thinking tokens

A 1-hour video costs approximately **$0.11-0.32** to analyze.
