Metadata-Version: 2.4
Name: gemini-omni-mcp
Version: 1.0.2
Summary: Gemini Omni Flash MCP server for text-to-video, image-to-video, reference-guided video, and conversational video editing
Project-URL: Homepage, https://github.com/nikships/gemini-omni-mcp
Project-URL: Repository, https://github.com/nikships/gemini-omni-mcp
Project-URL: Issues, https://github.com/nikships/gemini-omni-mcp/issues
Project-URL: Documentation, https://github.com/nikships/gemini-omni-mcp/blob/main/README.md
Author-email: Gemini Omni MCP <noreply@example.com>
License: MIT
License-File: LICENSE
Keywords: ai,claude,fastmcp,gemini,gemini-omni-flash,google-ai,mcp,video-generation
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.11
Requires-Dist: fastmcp<4,>=3.0
Requires-Dist: google-genai<3,>=2.10.0
Requires-Dist: pillow>=10.4.0
Requires-Dist: pydantic-settings>=2.0.0
Requires-Dist: pydantic>=2.0.0
Provides-Extra: dev
Requires-Dist: build>=1.2.0; extra == 'dev'
Requires-Dist: mypy>=1.8.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.24.0; extra == 'dev'
Requires-Dist: pytest-cov>=6.0.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: ruff>=0.8.0; extra == 'dev'
Description-Content-Type: text/markdown


![Gemini Omni MCP Banner](https://raw.githubusercontent.com/nikships/gemini-omni-mcp/main/showcase/banner.png)

# Gemini Omni MCP

> MCP server for Google's **Gemini Omni Flash** video model — text-to-video, image-to-video, reference-guided video, and conversational video editing with native audio, straight from your AI agent.

[![PyPI version](https://img.shields.io/pypi/v/gemini-omni-mcp)](https://pypi.org/project/gemini-omni-mcp/)
[![Python 3.11+](https://img.shields.io/badge/python-3.11%2B-blue)](https://pypi.org/project/gemini-omni-mcp/)
[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)

---

## Setup

Get a Gemini API key from [Google AI Studio](https://aistudio.google.com/apikey), then add the server to your MCP config.

### Claude Desktop / Claude Code / Cursor

Add to your MCP config (`mcp.json` / `.claude.json` / `claude_desktop_config.json`):

```json
{
  "mcpServers": {
    "gemini-omni": {
      "command": "uvx",
      "args": ["gemini-omni-mcp@latest"],
      "env": {
        "GEMINI_API_KEY": "your-api-key-here"
      }
    }
  }
}
```

### Droid CLI

```bash
droid mcp add gemini-omni "uvx gemini-omni-mcp@latest" --env GEMINI_API_KEY=your-api-key-here
```

Generated MP4s are saved to `~/gemini_omni_videos` by default (set `OUTPUT_DIR` to change).

---

## Features

- **Text-to-video**: prompt-only MP4 generation with generated audio (music, ambience, SFX)
- **Image-to-video**: animate a single reference image with motion and camera direction
- **Reference-to-video**: up to 6 reference images to lock subjects, style, or props
- **Conversational editing**: iterate on a generated video via `previous_interaction_id`, or upload your own MP4 and edit it
- **Prompt role tags**: `<FIRST_FRAME>` and `<IMAGE_REF_N>` bind reference images to roles
- **Timing cues**: `[0-3s]`, `[3-6s]`, `[6-10s]` direct the action beat by beat
- **Batch generation**: run multiple prompts in conservative parallel batches (max 4)
- **URI or inline delivery**: robust Files API polling and download built in

Output is 720p 24fps MP4 with SynthID watermarking (preview-quality model).

---

## Showcase

All videos below were generated by this server with `gemini-omni-flash-preview`, sound on.

### Corgi on a hoverboard

> A corgi wearing tiny goggles rides a glowing hoverboard through a neon-lit Tokyo street at night, rain reflections on the pavement, camera tracking alongside, single continuous shot, cinematic lighting, upbeat synthwave music.



https://github.com/user-attachments/assets/4b9a8e87-7db0-469d-a261-3c354f7fe9b8


### Astronaut latte art

> An astronaut in a white spacesuit pours latte art into a floating cup inside a cozy moon-base cafe, Earth visible through a large window, steam swirling in low gravity, slow dolly-in, warm lighting, gentle ambient cafe sounds.



https://github.com/user-attachments/assets/88072ef8-1ce4-4c80-a05a-bfb5531d1271



### Origami ocean

> An origami paper whale swims gracefully through a stylized paper-craft ocean, paper waves folding and unfolding, paper seagulls gliding above, soft sunlight, camera slowly orbiting, calm orchestral score.



https://github.com/user-attachments/assets/c1e31341-e53c-4a55-af12-d6bb9432dcf5


---

## Tools

### `generate_video`

Generates or edits one MP4 and returns JSON with `video.path`, `interaction_id`, and metadata.

| Argument | Type | Description |
|----------|------|-------------|
| `prompt` | string | Scene, motion, camera, lighting, mood, and audio direction |
| `task` | string? | `text_to_video`, `image_to_video`, `reference_to_video`, or `edit`. Inferred if omitted |
| `aspect_ratio` | string? | `16:9` (default) or `9:16` |
| `duration_seconds` | int? | Optional preview field, `3` to `10` |
| `reference_image_paths` | list? | Up to 6 local image paths |
| `input_video_path` | string? | Local MP4 to upload and edit |
| `delivery` | string? | `uri` (default, recommended) or `inline` |
| `previous_interaction_id` | string? | Continue editing a generated video |
| `enhance_prompt` | bool? | Optional LLM prompt enhancement, default `false` |

### `batch_generate`

Runs multiple prompts in parallel batches, capped at 4.

| Argument | Type | Description |
|----------|------|-------------|
| `prompts` | list | One prompt per video |
| `task`, `aspect_ratio`, `duration_seconds`, `reference_image_paths`, `delivery`, `enhance_prompt` | — | Shared across the batch, same semantics as `generate_video` |
| `batch_size` | int? | Parallelism, capped at `MAX_BATCH_SIZE` |

---

## Configuration

Everything is configured through environment variables (or a local `.env`):

| Variable | Default | Description |
|----------|---------|-------------|
| `GEMINI_API_KEY` | — | Required. `GOOGLE_API_KEY` also accepted |
| `OUTPUT_DIR` | `~/gemini_omni_videos` | Where generated MP4s are saved |
| `DEFAULT_ASPECT_RATIO` | `16:9` | `16:9` or `9:16` |
| `DEFAULT_DELIVERY` | `uri` | `uri` or `inline` |
| `DEFAULT_DURATION_SECONDS` | unset | Optional 3-10s target |
| `REQUEST_TIMEOUT` | `300` | Generation timeout in seconds |
| `FILE_POLL_INTERVAL` | `5.0` | Seconds between Files API polls |
| `FILE_POLL_TIMEOUT` | `600` | Max seconds waiting for file activation |
| `MAX_BATCH_SIZE` | `4` | Max parallel generations |
| `ENABLE_PROMPT_ENHANCEMENT` | `false` | LLM-enhance prompts before generation |
| `LOG_LEVEL` | `INFO` | Logging level |

---

## Prompting tips

- Ask for a **"single continuous shot"** and **"no scene cuts"** for one-scene outputs.
- Always include **audio direction**, for example "gentle ambient sound, no dialogue".
- For edits, keep the prompt short and add **"Keep everything else the same"**.
- Use `<FIRST_FRAME>` and `<IMAGE_REF_N>` tags to bind reference-image roles.
- Timing cues like `[0-3s]`, `[3-6s]`, and `[6-10s]` work well.

### Limitations

- Preview model: 720p, 24fps, MP4 only, SynthID-watermarked.
- System instructions, temperature, negative prompts, voice edits, YouTube sources, and multi-video reasoning are unsupported.
- Uploaded-video editing is unavailable in some regions.

---

## Development

```bash
git clone https://github.com/nikships/gemini-omni-mcp
cd gemini-omni-mcp
uv sync --all-extras
uv run ruff format .
uv run ruff check .
uv run mypy gemini_omni_mcp/
uv run pytest
uv build
```

Releases are automated: every push to `main` bumps the version and publishes to PyPI (see [PUBLISHING.md](PUBLISHING.md)).

## License

Gemini Omni MCP is licensed under the MIT license. See [`LICENSE`](LICENSE) for details.
