Metadata-Version: 2.4
Name: synthetic-image-caption-generator
Version: 0.0.4
Summary: Tool to generate captions to use as input for pre-trained image generation models, aligned with some pre-existing captions.
Author: Alex Senden
Maintainer: Alex Senden
License: MIT
Keywords: captioning,image-generation,machine-learning,transformers
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Requires-Dist: accelerate>=0.20.0
Requires-Dist: torch>=2.0.0
Requires-Dist: transformers>=4.30.0
Description-Content-Type: text/markdown

# Synthetic Image Caption Generator

A tool that uses Qwen to generate image captions similar to those in a given dataset. The model learns from example prompts and generates new captions that match their style and structure.

This tool is designed to be used for deep generative dataset augmentation.

## Features

- Uses Qwen from HuggingFace🤗 Transformers
- Supports few-shot learning with configurable number of examples
- Reads prompts from .txt files in a dataset directory
- Flexible CLI with multiple configuration options
- Can generate multiple captions in one run
- Output to file or stdout

## Installation

To install from PyPI:

```bash
pip install synthetic-image-caption-generator
```

To install locally:

```bash
pip install -e .
```

Or install dependencies manually:

```bash
pip install transformers>=4.30.0 torch>=2.0.0 accelerate>=0.20.0
```

## Usage

### Downloading Models

Before generating captions, you can pre-download a model to the Hugging Face cache for offline use:

```bash
# Download the default model (qwen2.5-32b)
download-caption-model

# Download a specific model
download-caption-model --model qwen2.5-7b

# Download a larger model
download-caption-model --model qwen2.5-72b
```

### Basic Usage

Generate a caption using 5 example prompts from the dataset:

```bash
generate-captions /path/to/dataset
```

### Specify Number of Examples

Use a different number of example prompts (e.g., 10):

```bash
generate-captions /path/to/dataset --num-examples 10
```

### Choose a Model

Select a different Qwen model (e.g., smaller or larger):

```bash
# Use the smaller 7B model
generate-captions /path/to/dataset --model qwen2.5-7b

# Use the larger 72B model
generate-captions /path/to/dataset --model qwen2.5-72b

# Use Qwen3 14B
generate-captions /path/to/dataset --model qwen3-14b
```

### Generate Multiple Captions

Generate 5 captions:

```bash
generate-captions /path/to/dataset --num-generate 5
```

### Specify Object Information

Include information about what's in the image:

```bash
generate-captions /path/to/dataset --object-info "the image contains 3 elephants"
```

```bash
generate-captions /path/to/dataset --object-info "the main crop in the field is soybean"
```

### Save to File

Save generated captions to a file:

```bash
generate-captions /path/to/dataset --num-generate 10 --output generated_captions.txt
```

### Advanced Options

```bash
generate-captions /path/to/dataset \
  --model qwen2.5-14b \
  --num-examples 8 \
  --num-generate 5 \
  --temperature 0.8 \
  --max-length 300 \
  --object-info "a cityscape with tall buildings" \
  --output captions.txt
```

## Command-Line Arguments

### `generate-captions`

- `dataset_dir` (required): Path to directory containing .txt files with caption prompts
- `--model`: Qwen model to use (default: qwen2.5-32b). Options: qwen2.5-0.5b, qwen2.5-1.5b, qwen2.5-3b, qwen2.5-7b, qwen2.5-14b, qwen2.5-32b, qwen2.5-72b, qwen3-14b, qwen3-32b
- `--num-examples`: Number of example prompts to provide to the model (default: 5)
- `--num-generate`: Number of captions to generate (default: 1)
- `--output`: Output file to save generated captions (optional, prints to stdout if not specified)
- `--temperature`: Temperature for text generation (default: 0.7, higher = more creative)
- `--max-length`: Maximum length of generated caption (default: 256)
- `--object-info`: Information about objects/content in the image (e.g., "the image contains 3 elephants")

### `download-caption-model`

- `--model`: Qwen model to download (default: qwen2.5-32b). Options: qwen2.5-0.5b, qwen2.5-1.5b, qwen2.5-3b, qwen2.5-7b, qwen2.5-14b, qwen2.5-32b, qwen2.5-72b, qwen3-14b, qwen3-32b

## Dataset Format

The dataset directory should contain `.txt` files, each with one or more prompts. Each prompt should be on its own line.

Example dataset structure:

```
dataset/
├── captions1.txt
├── captions2.txt
└── captions3.txt
```

Example content of `captions1.txt`:

```
A serene landscape with mountains in the background and a lake in the foreground
An urban street scene with people walking and cars passing by
A close-up portrait of a person smiling at the camera
```

## Requirements

- Python >= 3.8
- CUDA-capable GPU recommended
- Sufficient GPU memory depending on chosen model:
    - Small models (0.5B-3B): 4-8GB VRAM
    - Medium models (7B-14B): 12-24GB VRAM
    - Large models (32B): 24-40GB VRAM
    - Extra large models (72B): 40GB+ VRAM or quantization required

## How It Works

1. The script reads all prompts from .txt files in the specified directory
2. It randomly samples a specified number of prompts as examples
3. These examples are formatted into a few-shot prompt for Qwen2.5-32B-Instruct
4. The model generates a new caption that matches the style and structure of the examples
5. The generated caption(s) are output to stdout or saved to a file

## License

MIT
