Metadata-Version: 2.4
Name: clip2context
Version: 0.1.3
Summary: Extract frames and transcripts from video files for LLM context and multimodal pipelines.
Project-URL: Homepage, https://github.com/adarsh-retainia/clip2context
Project-URL: Repository, https://github.com/adarsh-retainia/clip2context
Author-email: Adarsh Senghani <adarsh@retainia.com>
License: MIT
License-File: LICENSE
Keywords: ffmpeg,frames,llm,transcript,video,whisper
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.12
Requires-Dist: openai-whisper>=20250625
Description-Content-Type: text/markdown

# clip2context

Extract frames and transcripts from video files — structured output ready for LLM context, multimodal pipelines, or archival.

Given one or more video files, `clip2context` produces:

- **Frames** — high-quality WebP images at a configurable frame rate, plus a JSON manifest mapping each frame to its timestamp.
- **Transcript** — plain text, timestamped JSON segments, and a human-readable timed text file, generated by OpenAI Whisper.

## Requirements

- Python 3.12+
- [FFmpeg](https://ffmpeg.org/download.html) (must be on `PATH`)

Install FFmpeg via your package manager:

```bash
# macOS
brew install ffmpeg

# Ubuntu / Debian
sudo apt install ffmpeg

# Windows (winget)
winget install ffmpeg
```

## Installation

```bash
pip install clip2context
```

Or from source with [uv](https://github.com/astral-sh/uv):

```bash
git clone <repo-url>
cd clip2context
uv sync
```

## Usage

```bash
clip2context <video_path> [<video_path> ...] [options]
```

**Arguments**

| Argument | Description |
|---|---|
| `video_paths` | One or more video files or directories containing videos. |
| `--output-dir DIR` | Base directory for all output (default: `output/`). |
| `--fps FLOAT` | Frames per second to extract (default: `1.0`). Use `0.5` for one frame every two seconds. |
| `--quality 1-100` | WebP compression quality (default: `95`). Lower = smaller files. |
| `--only-frames` | Extract frames only; skip transcription. |
| `--only-transcripts` | Extract transcripts only; skip frame extraction. |

**Examples**

```bash
# Process a single video with defaults (1 fps, quality 95)
clip2context interview.mp4

# Process all videos in a folder, 1 frame every 2 seconds
clip2context ./recordings/ --fps 0.5

# Transcripts only, custom output directory
clip2context lecture.mp4 --only-transcripts --output-dir ./results

# Frames only, lower quality for smaller file sizes
clip2context demo.mov --only-frames --fps 2 --quality 75

# Process multiple videos at once
clip2context video1.mp4 video2.mp4 video3.mp4
```

### Python API

You can also use `clip2context` programmatically:

```python
from clip2context.main import run

# Full extraction (frames + transcript)
run("interview.mp4")

# Custom options
run(
    "lecture.mp4",
    output_base="results/",
    fps=0.5,
    quality=80,
    do_frames=True,
    do_transcript=True,
)
```

Or use the individual extractors directly:

```python
from clip2context.extract_frames import extract_frames
from clip2context.extract_transcript import extract_transcript

# Extract frames → returns (output_dir, frame_count)
output_dir, count = extract_frames("video.mp4", "output/frames", fps=1.0, quality=95)

# Transcribe audio → returns output_dir
output_dir = extract_transcript("video.mp4", "output/transcript")
```

## Output layout

Each video produces output under `<output_dir>/<video_stem>/`:

```
output/
└── interview/
    ├── frames/
    │   ├── frame_0001.webp
    │   ├── frame_0002.webp
    │   ├── …
    │   └── frames_manifest.json
    └── transcript/
        ├── transcript_raw.txt
        ├── transcript_timestamped.json
        └── transcript_timed.txt
```

### `frames_manifest.json`

Maps each frame file to its timestamp:

```json
[
  {
    "frame_filename": "frame_0001.webp",
    "timestamp_seconds": 0.0,
    "timestamp_formatted": "00:00:00"
  },
  ...
]
```

### `transcript_timestamped.json`

Word-accurate segment boundaries from Whisper:

```json
[
  {
    "start": 0.0,
    "end": 4.28,
    "text": "Welcome to today's session."
  },
  ...
]
```

### `transcript_timed.txt`

Human-readable transcript with timestamps:

```
[00:00:00] Welcome to today's session.
[00:00:04] Let's get started.
```

## Supported formats

`.mp4` `.mov` `.avi` `.mkv` `.webm` `.m4v` `.flv` `.wmv`

## License

MIT
