🚀 AI Image Platform

Comprehensive Python Library & API Documentation

🌟 Platform Overview

The AI Image Platform is a comprehensive Python library and RESTful API that provides advanced AI-powered image processing, generation, editing, and multimodal chat capabilities. Built for serverless deployment with Flask, it integrates multiple AI providers for maximum flexibility and performance.

🤖

Multimodal Chat

Advanced chat with 6 Gemini AI models supporting both text and image inputs with conversation history.

🎨

Image Generation

High-quality text-to-image generation using Gemini AI and Pollinations.ai with multiple styles and aspect ratios.

🔍

Image Analysis

Intelligent image analysis, description, and understanding using advanced AI vision models.

Image Editing

AI-powered image editing, transformation, and multi-image composition with natural language prompts.

☁️

Serverless Ready

Built for serverless deployment with Flask, supporting autoscale, VM, and scheduled deployment modes.

📚

Python Library

Clean, well-documented Python library with comprehensive classes and utility functions.

🚀 Quick Start

Installation

# Clone the repository git clone https://github.com/your-repo/ai-image-platform.git cd ai-image-platform # Install dependencies pip install -r requirements.txt # Set environment variables export GEMINI_API_KEY="your_gemini_api_key_here"

Running the Server

# Development server python main.py # Production server gunicorn --bind=0.0.0.0:5000 --reuse-port main:app
🔑 API Key Required: You need a Google Gemini API key to use most features. Get one from the Google AI Studio.

🌐 API Endpoints

GET /api/health

Health Check

Monitor service status and API availability.

Response:

{ "status": "healthy", "message": "AI Image Platform API is running", "services": { "gemini": { "status": "healthy", "gemini_api": "available" }, "pollinations": { "status": "healthy", "pollinations_api": "available" } } }
POST /api/chat

Multimodal Chat

Chat with AI models supporting text and image inputs.

Request Body:

Parameter Type Required Description
question string Required Your question or message
model string Optional Gemini model (default: gemini-2.5-flash-lite)
image_data string Optional Base64 encoded image data

Available Models:

  • gemini-2.5-flash-lite - Fastest (default)
  • gemini-1.5-flash - High-speed lightweight
  • gemini-2.5-flash - Advanced with excellent speed
  • gemini-2.0-flash - Next-gen performance
  • gemini-2.5-pro - Performance and quality
  • gemini-1.5-flash-8b - 8B parameter lightweight

Example Request:

{ "question": "What's in this image?", "model": "gemini-2.5-flash-lite", "image_data": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQ..." }

Example Response:

{ "status": "success", "question": "What's in this image?", "answer": "This image shows a beautiful sunset over mountains...", "model_used": "gemini-2.5-flash-lite" }
POST /api/analyze-image

Image Analysis

Analyze images for detailed descriptions and understanding.

Request Body:

Parameter Type Required Description
image_data string Optional* Base64 encoded image data
image_url string Optional* URL of image to analyze
extract_masks boolean Optional Extract segmentation masks (default: false)

*Either image_data or image_url is required

Example Response:

{ "status": "success", "analysis": "This image contains a modern city skyline...", "image_format": "JPEG", "image_size": [1920, 1080], "segmentation_masks": [] }
POST /api/generate-image

Gemini Image Generation

Generate high-quality images from text using Gemini AI.

Request Body:

Parameter Type Required Description
prompt string Required Text description of desired image
style string Optional Artistic style (default: photorealistic)
aspect_ratio string Optional Image aspect ratio (default: 1:1)

Available Styles:

  • photorealistic - Photorealistic and detailed (default)
  • cartoon - Cartoon style and animated
  • abstract - Abstract art style
  • impressionistic - Impressionist painting style
  • cyberpunk - Cyberpunk and futuristic art
  • anime - Anime and manga style
  • oil_painting - Oil painting technique
  • watercolor - Watercolor painting style
  • sketch - Pencil sketch style
  • digital_art - Digital art style

Available Aspect Ratios:

  • 1:1 - Square format (default)
  • 16:9 - Landscape widescreen
  • 9:16 - Portrait vertical
  • 4:3 - Standard landscape
  • 3:4 - Standard portrait

Example Response:

{ "status": "success", "image_base64": "iVBORw0KGgoAAAANSUhEUgAA...", "response_text": "Generated a beautiful mountain landscape...", "filename": "generated_image_20240905_143000_abc123.png", "prompt": "A serene mountain landscape at sunset", "style": "photorealistic", "aspect_ratio": "16:9" }
POST /api/pollinations/generate-image

Pollinations.ai Image Generation

Generate high-quality images using Pollinations.ai Flux models (no API key required).

Request Body:

Parameter Type Required Description
prompt string Required Text description of desired image
width integer Optional Image width (default: 1024)
height integer Optional Image height (default: 1024)
model string Optional Model to use (default: flux)
seed integer Optional Random seed for reproducible results
enhance boolean Optional Auto-enhance prompt (default: true)

Available Models:

  • flux - Standard Flux model (default)
  • flux-realism - Photorealistic results
  • flux-anime - Anime/manga style
  • flux-3d - 3D rendered style
  • any-dark - Dark artistic style
  • flux-pro - Premium quality
POST /api/edit-image

Image Editing

Edit and transform images using AI with natural language prompts.

Request Body:

Parameter Type Required Description
image_data string Required Base64 encoded image data
edit_prompt string Required Description of desired edits
style string Optional Target artistic style
aspect_ratio string Optional Target aspect ratio

🐍 Python Library Usage

Installation as Library

# Import the library from ai_image_platform import ( GeminiChatClient, ImageAnalyzer, ImageGenerator, ImageEditor, PollinationsClient ) # Or import specific functions from ai_image_platform.core.image_generation import generate_image_from_text from ai_image_platform.core.image_analysis import analyze_base64_image

Chat Client Example

# Initialize chat client chat_client = GeminiChatClient() # Text-only chat response = chat_client.ask_question( "Explain quantum computing in simple terms", selected_model="gemini-2.5-flash-lite" ) print(response['answer']) # Multimodal chat with image with open('image.jpg', 'rb') as f: image_data = base64.b64encode(f.read()).decode() response = chat_client.ask_question( "What's in this image?", selected_model="gemini-2.5-flash", image_data=image_data ) print(response['answer'])

Image Generation Example

# Using the class generator = ImageGenerator() result = generator.generate_image( prompt_text="A majestic dragon in a mystical forest", style="digital_art", aspect_ratio="16:9" ) if result['status'] == 'success': # Save the generated image image_data = base64.b64decode(result['image_base64']) with open(result['filename'], 'wb') as f: f.write(image_data) # Using convenience function result = generate_image_from_text( "A peaceful zen garden with cherry blossoms", style="watercolor", aspect_ratio="4:3" )

Pollinations Client Example

# Initialize Pollinations client (no API key required) pollinations = PollinationsClient() # Generate image result = pollinations.generate_image( prompt="A futuristic cityscape at night", width=1024, height=576, model="flux-realism", seed=12345 ) if result['status'] == 'success': print(f"Generated: {result['filename']}") # image_data is in result['image_base64']

Image Analysis Example

# Using the class analyzer = ImageAnalyzer() # Analyze from file with open('photo.jpg', 'rb') as f: image_bytes = f.read() analysis = analyzer.analyze_image_bytes( image_bytes, prompt="Describe this image in detail" ) print(analysis) # Using convenience function for base64 image result = analyze_base64_image( base64_image_data, extract_masks=True ) print(result['analysis'])

Flask App Integration

from ai_image_platform.api import create_app # Create Flask app app = create_app() # Custom configuration config = { 'SECRET_KEY': 'your-secret-key', 'MAX_CONTENT_LENGTH': 32 * 1024 * 1024 # 32MB } app = create_app(config) # Run the app if __name__ == '__main__': app.run(host='0.0.0.0', port=5000, debug=True)

💡 Usage Examples

JavaScript/Frontend Integration

// Chat with image upload async function chatWithImage(question, imageFile, model = 'gemini-2.5-flash-lite') { // Convert image to base64 const reader = new FileReader(); const imageData = await new Promise(resolve => { reader.onload = e => resolve(e.target.result); reader.readAsDataURL(imageFile); }); const response = await fetch('/api/chat', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ question: question, model: model, image_data: imageData }) }); return await response.json(); } // Generate image async function generateImage(prompt, style = 'photorealistic') { const response = await fetch('/api/generate-image', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ prompt: prompt, style: style, aspect_ratio: '16:9' }) }); const result = await response.json(); if (result.status === 'success') { // Display image const img = document.createElement('img'); img.src = 'data:image/png;base64,' + result.image_base64; document.body.appendChild(img); } return result; }

Python Script Examples

#!/usr/bin/env python3 """ Example: Batch Image Analysis """ import os import glob from ai_image_platform import ImageAnalyzer def analyze_image_folder(folder_path): analyzer = ImageAnalyzer() # Supported formats image_extensions = ['*.jpg', '*.jpeg', '*.png', '*.bmp', '*.gif'] for ext in image_extensions: for image_path in glob.glob(os.path.join(folder_path, ext)): print(f"Analyzing: {image_path}") with open(image_path, 'rb') as f: image_bytes = f.read() try: analysis = analyzer.analyze_image_bytes(image_bytes) print(f"Analysis: {analysis[:200]}...\n") except Exception as e: print(f"Error: {e}\n") if __name__ == '__main__': analyze_image_folder('./images')

cURL Examples

# Health check curl -X GET http://localhost:5000/api/health # Chat request curl -X POST http://localhost:5000/api/chat \ -H "Content-Type: application/json" \ -d '{ "question": "What is the capital of France?", "model": "gemini-2.5-flash-lite" }' # Generate image with Pollinations.ai curl -X POST http://localhost:5000/api/pollinations/generate-image \ -H "Content-Type: application/json" \ -d '{ "prompt": "A beautiful sunset over the ocean", "width": 1024, "height": 576, "model": "flux-realism" }' # Analyze image from URL curl -X POST http://localhost:5000/api/analyze-image \ -H "Content-Type: application/json" \ -d '{ "image_url": "https://example.com/image.jpg" }'

🚀 Serverless Deployment

Replit Deployment

Autoscale Mode: Perfect for stateless image processing APIs that scale automatically based on demand.
{ "deployment_target": "autoscale", "run": ["gunicorn", "--bind=0.0.0.0:5000", "--reuse-port", "main:app"], "build": null }

Environment Variables

Variable Required Description
GEMINI_API_KEY Required Google Gemini API key for AI features
SECRET_KEY Optional Flask secret key (auto-generated if not set)

Production Configuration

# production_config.py import os class ProductionConfig: SECRET_KEY = os.environ.get('SECRET_KEY') or 'production-secret-key' GEMINI_API_KEY = os.environ.get('GEMINI_API_KEY') MAX_CONTENT_LENGTH = 16 * 1024 * 1024 # 16MB # Logging configuration LOGGING = { 'version': 1, 'handlers': { 'default': { 'class': 'logging.StreamHandler', 'formatter': 'default' } }, 'formatters': { 'default': { 'format': '[%(asctime)s] %(levelname)s in %(module)s: %(message)s', } }, 'root': { 'level': 'INFO', 'handlers': ['default'] } } # main.py from ai_image_platform.api import create_app from production_config import ProductionConfig app = create_app(ProductionConfig.__dict__) if __name__ == '__main__': app.run(host='0.0.0.0', port=5000)
⚠️ API Rate Limits: Be mindful of Google Gemini API rate limits and quotas in production. Implement appropriate error handling and retry logic.
✅ Serverless Benefits: The platform is designed for serverless deployment with no file system dependencies, using base64 for all image I/O operations.