Metadata-Version: 2.4
Name: iflow-mcp_7etsuo_image-description-mcp-server
Version: 0.1.1
Summary: AI-powered image analysis MCP server using Grok API
Requires-Python: >=3.11
Requires-Dist: flask
Requires-Dist: httpx
Requires-Dist: mcp[cli]>=1.2.0
Requires-Dist: numpy
Requires-Dist: pillow
Requires-Dist: pytesseract
Requires-Dist: python-multipart
Requires-Dist: requests
Requires-Dist: watchdog
Description-Content-Type: text/markdown

# Image Description MCP Server

A Model Context Protocol (MCP) server that provides AI-powered image analysis using xAI's Grok API.

## Purpose

This MCP server provides a secure interface for AI assistants to analyze images using Grok's advanced vision capabilities. It supports both web-hosted images and local files, offering detailed descriptions, technical metadata extraction, and optical character recognition (OCR).

## Features

### Current Implementation
- **`describe_image_url`** - Analyzes images from web URLs and provides AI-generated descriptions
- **`describe_image_file`** - Analyzes local image files and provides AI-generated descriptions
- **`extract_text_from_image`** - Performs OCR to extract readable text from images

## Prerequisites

- Docker Desktop with MCP Toolkit enabled
- Docker MCP CLI plugin (`docker mcp` command)
- Grok API key from https://console.x.ai/

## Installation

See the step-by-step instructions provided with the files.

## Usage Examples

In Grok4 Code Fast, you can ask:

- "Describe this image: https://example.com/image.jpg"
- "What does this local image show: /path/to/image.png"
- "Extract any text from this image: https://example.com/document.jpg"
- "Give me a detailed analysis of this photo: https://example.com/photo.jpg"
- "What text can you read in this screenshot: https://example.com/screenshot.png"

### Local Testing

```bash
# Set environment variables for testing
export GROK_API_KEY="your-grok-api-key"

# Run directly
python image-description-mcp_server.py

# Test MCP protocol
echo '{"jsonrpc":"2.0","method":"tools/list","id":1}' | python image-description-mcp_server.py
```

### Adding New Tools

1. Add the function to `image-description-mcp_server.py`
2. Decorate with `@mcp.tool()`
3. Update the catalog entry with the new tool name
4. Rebuild the Docker image

## Troubleshooting

### Authentication Errors
- Verify secrets with `docker mcp secret list`
- Ensure GROK_API_KEY is set correctly
- Check API key validity at https://console.x.ai/

### Image Processing Errors
- Ensure image URLs are accessible and valid
- Check local file paths exist and are readable
- Verify image formats are supported (JPEG, PNG, WebP, etc.)

## Security Considerations

- All secrets stored in Docker Desktop secrets
- Never hardcode API keys
- Running as non-root user in Docker
- Images processed temporarily in memory
- No permanent storage of image data
- Sensitive data never logged

## API Documentation

This service integrates with xAI's Grok API:
- **Grok API Reference**: https://docs.x.ai/docs/api-reference
- **MCP SDK Documentation**: https://github.com/modelcontextprotocol/sdk

## Data Sources

### External Image URLs
- **Source**: Web-hosted images accessible via HTTP/HTTPS
- **Access Method**: HTTP GET requests using httpx
- **Purpose**: Download images for analysis from any public URL
- **Limitations**: Only accessible URLs; no authentication-protected images

### Local Image Files
- **Source**: Filesystem access to local image files
- **Access Method**: Python file I/O
- **Purpose**: Analyze images stored locally on the user's system
- **Supported Paths**: Absolute and relative file paths
- **Supported Formats**: JPEG, PNG, WebP, TIFF, GIF, BMP

### Grok API
- **Source**: xAI's Grok model with vision capabilities
- **Access Method**: REST API calls via httpx
- **Purpose**: AI-powered image analysis and description generation
- **Data Flow**: Images converted to base64, sent to Grok, receive structured analysis

### Image Processing
- **Source**: PIL (Pillow) and OpenCV libraries
- **Access Method**: Local processing
- **Purpose**: Extract technical metadata and perform OCR
- **No External Calls**: Pure local processing

## License

MIT License
