83 MCP tools that give Claude Code, Cursor, and any AI agent the ability to trim, merge, overlay text, sync audio, apply filters, stabilize video, detect scenes, generate subtitles, extract waveforms, upscale with AI, separate audio stems, and more. Plus a rich CLI with templates for humans. Local, fast, free. 858 automated tests collected.
81-second overview of mcp-video features — created with Remotion + mcp-video
AI agents can write code, analyze documents, browse the web, create images — but video editing? Existing tools are either GUI-only (agents can't click buttons), raw FFmpeg wrappers (agents can't memorize hundreds of flags), or cloud APIs (expensive, slow, vendor lock-in). mcp-video bridges this gap.
Trim, merge, add text, sync audio, resize, convert, speed change, watermarks, subtitles, storyboards, crop, rotate, fade, filters, color grading, audio processing, stabilization, scene detection, AI-powered editing, transitions, and more.
Runs on your machine. No cloud uploads, no API keys, no per-minute billing. Your video never leaves your computer.
Parses FFmpeg errors into structured responses with actionable suggestions. "Codec error: vp9" becomes "Auto-convert from vp9 to H.264/AAC."
Blur, sharpen, brightness, contrast, saturation, grayscale, sepia, invert, vignette, Ken Burns animation. Plus color presets: warm, cool, vintage, cinematic, noir.
Normalize to LUFS, add reverb, compress, pitch shift, reduce noise. Extract waveforms with peak detection and silence region analysis.
Picture-in-picture, split-screen, masks with feathering, and multi-track timeline editing for complex projects.
Auto scene detection with threshold control, quality metrics (PSNR/SSIM) comparison, and comprehensive metadata editing.
Stabilize shaky footage with motion vectors. Create videos from image sequences. Generate and burn subtitles from text.
Apply the same operation to multiple files in one call. Trim 10 videos, blur 5, or convert a whole directory at once.
Filter types, color presets, and parameters are validated before hitting FFmpeg. Get clear error messages, not cryptic KeyError tracebacks.
Machine learning features for intelligent video processing.
Auto-remove dead air from audio tracks with configurable threshold and padding.
Whisper-powered speech-to-text with automatic subtitle generation and SRT export.
ML-enhanced cut detection for automatic scene boundary identification.
Isolate audio components — vocals, drums, bass, and other instruments.
Super-resolution upscaling for enhancing video resolution with ML models.
Auto color correction with ML-powered scene analysis and LUT application.
3D audio positioning for immersive surround sound experiences.
Professional transitions for seamless clip connections.
RGB shift effects for digital distortion transitions.
Block dissolve transition with configurable pixel size.
Mesh warp transitions for smooth shape-shifting effects.
Generate sound effects from code without external audio files.
Generate sine, square, sawtooth, and noise waveforms programmatically.
18+ ready-to-use synthesized sounds: beeps, boops, sweeps, and more.
Time-based composition for building complex audio sequences.
Apply reverb, filter, normalize, and other effects to synthesized audio.
"Can't I just have Claude Code run ffmpeg commands?" Sure — if you enjoy debugging 200-character filter graphs, wrestling with flag order semantics, and paying dollars per short clip on cloud APIs. Here's why mcp-video is different.
| mcp-video | Claude Code + FFmpeg | Cloud APIs | |
|---|---|---|---|
| Cost | Free, local | Free, but your time isn't | $0.28–$2.50 per 5s video |
| Privacy | 100% local | 100% local | Uploads to cloud |
| Error messages | Structured + auto-fix | FFmpeg stderr (cryptic) | HTTP errors |
| Parameter validation | Before execution | Only at runtime | API schema check |
| Text overlay | add_text(text="Hi") |
-vf "drawtext=text='Hi':fontfile=...:fontsize=24:x=(w-text_w)/2:y=h-50" |
Varies by API |
| Speed change | Handles audio-less videos | Fails if no audio stream | Usually handled |
| Discovery | 83 self-documenting tools | Must know hundreds of flags | API docs |
| Testing | 858 automated tests | None (ad-hoc) | Provider's tests |
Put -ss 10 before -i input.mp4 and FFmpeg seeks instantly. Put it after, and it decodes every frame up to 10s first. Same flags, completely different performance. Claude Code can't know which you meant.
Try adding audio to a silent video and you get: "Stream specifier ':a' matches no streams." No hint that the video has no audio. mcp-video checks before running and gives you a clear error.
Every FFmpeg call re-encodes. Chain 5 operations and you've re-encoded 5 times. mcp-video's timeline DSL composes operations into a single FFmpeg command where possible.
In a 10-step workflow, Claude must re-discover file paths, resolutions, and durations at every step. mcp-video returns structured results (path, duration, resolution) so the agent always has context.
Replicate charges $0.28–$0.50 per second of generated video. A 60-second clip costs $17–$30. RunwayML's Pro plan gives you ~90 seconds of Gen-4.5 for $28/month. mcp-video: $0, forever.
Your workflow depends on their API staying up, their pricing not changing, and their model versions not breaking your prompts. With mcp-video, FFmpeg runs on your machine — no API deprecation risk.
Every tool returns structured JSON. On success, you get output paths, duration, resolution. On failure, you get error types with auto-fix suggestions.
| Tool | What It Does |
|---|---|
| video_info | Get metadata: duration, resolution, codec, fps, file size |
| video_trim | Trim clip by start time + duration or end time |
| video_merge | Concatenate multiple clips with optional transitions |
| video_add_text | Overlay text, titles, captions with custom positioning |
| video_add_audio | Add, replace, or mix audio tracks with fade effects |
| video_resize | Change resolution or apply preset aspect ratios |
| video_convert | Convert between mp4, webm, gif, mov (with two-pass encoding) |
| video_speed | Speed up or slow down (slow-mo, time-lapse) |
| video_thumbnail | Extract a single frame at any timestamp |
| video_preview | Generate fast low-res preview for quick review |
| video_storyboard | Extract key frames + create grid for review |
| video_subtitles | Burn SRT/VTT subtitles into the video |
| video_generate_subtitles | Create SRT subtitles from text entries, optionally burn in |
| video_watermark | Add image watermark with opacity control |
| video_export | Render final video with quality presets |
| video_edit | Full timeline-based edit (JSON DSL for complex edits) |
| video_extract_audio | Extract audio as mp3, wav, aac, ogg, or flac |
| video_crop | Crop to rectangular region with custom offset |
| video_rotate | Rotate 90/180/270 degrees or flip horizontal/vertical |
| video_fade | Add fade in/out effects to video |
| video_filter | Apply visual filter (blur, sharpen, grayscale, ken_burns, and more) |
| video_blur | Blur video with custom radius and strength |
| video_color_grade | Apply color preset (warm, cool, vintage, cinematic, noir) |
| video_normalize_audio | Normalize loudness to LUFS target (YouTube, broadcast, Spotify) |
| video_audio_waveform | Extract audio waveform data (peaks and silence regions) |
| video_audio_reverb | Add echo/reverb effect to audio |
| video_audio_compressor | Dynamic range compression for balanced audio |
| video_audio_pitch_shift | Change audio pitch while maintaining duration |
| video_audio_noise_reduction | Remove background noise from audio tracks |
| video_overlay | Picture-in-picture overlay with positioning and opacity |
| video_split_screen | Side-by-side or top/bottom layout for two videos |
| video_stabilize | Stabilize shaky footage using motion vectors (requires vidstab) |
| video_apply_mask | Apply image mask with edge feathering for compositing |
| video_detect_scenes | Automatically identify scene changes in videos |
| video_create_from_images | Create videos from image sequences |
| video_export_frames | Export video as image frames |
| video_compare_quality | Compare PSNR/SSIM quality metrics between videos |
| video_read_metadata | Read video metadata tags |
| video_write_metadata | Write video metadata tags |
| video_batch | Apply same operation to multiple files at once |
mcp-video works as an MCP server for AI agents, a Python library for scripts, and a rich CLI tool for the terminal with human-friendly output and templates.
Add to your Claude Code or Cursor MCP config. Then just talk to your agent.
{
"mcpServers": {
"mcp-video": {
"command": "uvx",
"args": ["mcp-video"]
}
}
}Then: *"Trim this video from 0:30 to 1:00, add a title, and export for TikTok."*
Clean API for automation, pipelines, and batch processing.
from mcp_video import Client editor = Client() clip = editor.trim("v.mp4", start="0:30", duration="15") final = editor.resize(clip.output_path, aspect_ratio="9:16") result = editor.export(final.output_path) print(result.resolution) # 1080x1920
Human-friendly terminal output by default. Add --format json for scripts and piping.
# Rich table output (default) $ mcp-video info video.mp4 # ┌─────────────┬──────────────────┐ # │ Property │ Value │ # ├─────────────┼──────────────────┤ # │ Duration │ 45.23s │ # │ Resolution │ 1920x1080 │ # │ FPS │ 30 │ # └─────────────┴──────────────────┘ # JSON output for scripts $ mcp-video --format json info video.mp4 # Trim with progress spinner $ mcp-video trim video.mp4 -s 0:30 -d 15 # Apply a TikTok template $ mcp-video template tiktok video.mp4 --caption "Check this out!" # List all templates $ mcp-video templates
Describe multi-track edits with video clips, audio, text overlays, transitions, and export settings — all in one call.
editor.edit({ "width": 1080, "height": 1920, "tracks": [ { "type": "video", "clips": [ {"source": "intro.mp4", "duration": 5}, {"source": "main.mp4", "trim_start": 10, "duration": 30}, ], "transitions": [{"after_clip": 0, "type": "fade", "duration": 1.0}], }, { "type": "audio", "clips": [{"source": "music.mp3", "volume": 0.7}], }, { "type": "text", "elements": [{"text": "EPISODE 42", "position": "top-center"}], }, ], "export": {"format": "mp4", "quality": "high"}, })
Pre-built templates handle the correct dimensions, text positioning, and export settings for each social media platform.
9:16, 1080x1920. Caption at bottom-center. Optional background music at volume 0.5.
9:16, title at top-center with size 42. Same format as TikTok but different text placement.
9:16, caption at bottom-center. Clean, simple template.
16:9, 1920x1080. Supports title card (3s), outro clip, and background music.
1:1, 1080x1080. Square format with optional caption overlay.
Build your own with the Timeline DSL. Any dimensions, any number of tracks.
Core video paths use actual FFmpeg operations, while dependency-heavy AI and Remotion paths are isolated with targeted mocks. The non-slow suite currently passes with 733 tests.
AI Features tested: Scene detection, silence removal, transcription (Whisper), stem separation (Demucs), AI upscale (OpenCV DNN), color grading, spatial audio, color extraction
Dependencies installed: FFmpeg with vidstab, demucs, torchcodec, opencv-contrib-python
# 1. Install FFmpeg (if you don't have it) brew install ffmpeg # For full text overlay support (drawtext filter): # brew install freetype harfbuzz # brew reinstall --build-from-source ffmpeg # Verify: ffmpeg -filters | grep drawtext # 2. Install mcp-video pip install mcp-video # 3. Add to your Claude Code MCP config # (Settings > MCP Servers > Add > mcp-video) # 4. Start editing! # Just tell your agent what you want: # "Trim this video to 30 seconds and add a title"