Metadata-Version: 2.4
Name: dqn-ale-spaceinvaders
Version: 0.1.2
Summary: A Deep Q-Network (DQN) implementation for Atari Space Invaders using Gymnasium and PyTorch.
License: MIT
License-File: LICENSE
Keywords: reinforcement-learning,deep-reinforcement-learning,deep-q-learning-network,dqn,atari,gymnasium,space-invaders,arcade-learning-environment,ale,pytorch
Author: Giansimone Perrino
Author-email: hello@giansimone.dev
Requires-Python: >=3.13,<3.14
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Dist: gymnasium[accept-rom-license,atari] (>=1.2.1,<2.0.0)
Requires-Dist: huggingface-hub (>=0.35.3,<0.36.0)
Requires-Dist: imageio[ffmpeg] (>=2.37.0,<3.0.0)
Requires-Dist: opencv-python (>=4.12.0.88,<5.0.0.0)
Requires-Dist: tensorboard (>=2.20.0,<3.0.0)
Requires-Dist: torch (>=2.8.0,<3.0.0)
Requires-Dist: tqdm (>=4.67.1,<5.0.0)
Project-URL: Repository, https://github.com/giansimone/dqn-ale-spaceinvaders
Description-Content-Type: text/markdown

[![Python](https://img.shields.io/pypi/pyversions/dqn-ale-spaceinvaders.svg)](https://badge.fury.io/py/dqn-ale-spaceinvaders)
[![PyPI](https://badge.fury.io/py/dqn-ale-spaceinvaders.svg)](https://badge.fury.io/py/dqn-ale-spaceinvaders)
[![License](https://img.shields.io/github/license/giansimone/dqn-ale-spaceinvaders)](https://github.com/giansimone/dqn-ale-spaceinvaders/blob/main/LICENSE)

# Deep Q-Network (DQN) for Atari Space Invaders

<p align="center">
    <img src="https://raw.githubusercontent.com/giansimone/dqn-ale-spaceinvaders/main/docs/demo.gif" alt="Agent Playing Space Invaders" width="300"/>
</p>

A PyTorch implementation of Deep Q-Learning Network (DQN) trained to play Atari Space Invaders using the Arcade Learning Environment (ALE).

## Features

- **Vanilla DQN** and **Dueling DQN** architectures.
- **Double DQN** support for improved stability.
- **Replay Buffer** for experience replay.
- **Epsilon-greedy exploration** with annealing (i.e., linear decay).
- **TensorBoard integration** for training visualisation.
- **Hugging Face Hub integration** for model sharing.
- **Video recording** of agent gameplay.

## Installation

You can install the package from PyPI or clone the repository and install the required dependencies using Poetry or pip. This project requires **Python 3.13**.

### PyPI

```bash
pip install dqn-ale-spaceinvaders
```

### Source

#### Using Poetry (Recommended)

```bash
# 1. Clone the repository
git clone https://github.com/giansimone/dqn-ale-spaceinvaders.git
cd dqn-ale-spaceinvaders

# 2. Initialize environment and install dependencies
poetry env use python3.13
poetry install

# 3. Activate the virtual environment
eval $(poetry env activate)
```

#### Using pip

```bash
# 1. Clone the repository
git clone https://github.com/giansimone/dqn-ale-spaceinvaders.git
cd dqn-ale-spaceinvaders

# 2. Create and activate a virtual environment
python3.13 -m venv venv
source venv/bin/activate

# 3. Install package in editable mode
pip install -e .
```

## Project Structure

```
dqn-ale-spaceinvaders/
├── dqn_ale_spaceinvaders/
│    ├── agent.py           # DQN agent implementation
│    ├── buffer.py          # Experience replay buffer
│    ├── config.yaml        # Agent configuration
│    ├── environment.py     # Environment setup and wrappers
│    ├── model.py           # Deep learning architectures
│    ├── train.py           # Training script
│    ├── enjoy.py           # Play with trained agent
│    ├── export.py          # Export model to Hugging Face Hub
│    └── utils.py           # Utility functions
├── .gitignore
├── LICENSE
├── README.md
└── pyproject.toml
```

## Usage

### Training

Train a DQN agent with the default configuration.

```bash
python -m dqn_ale_spaceinvaders.train
```

The training script will:
- Create a timestamped run directory in `runs/`.
- Save the configuration, checkpoints, and TensorBoard logs.
- Periodically evaluate the agent and save the best model.

### Configuration

Edit `config.yaml` to customize training parameters.

```yaml
# Environment
env_id: ALE/SpaceInvaders-v5
frame_skip: 5
frame_stack: 4
resized_frame: 84

# Training
training_steps: 10000000
n_eval_episodes: 10

# Exploration
warmup_steps: 100000
epsilon_start: 1.0
epsilon_end: 0.1
anneal_steps: 1000000

# Replay Buffer
buffer_size: 200000
batch_size: 32

# Learning
gamma: 0.99
lr: 0.00025
update_every: 25000
target_update_every: 10000

# DQN Variants
double_dqn: False    # Enable Double DQN
dueling: False       # Enable Dueling DQN
clip_rewards: False  # Clip rewards to [-1, 1]
```

### Monitoring Training

View training progress with TensorBoard:

```bash
tensorboard --logdir runs/dqn_YYYY-MM-DD_HHhMMmSSs/
```

### Testing a Trained Agent

Watch your trained agent play:

```bash
python -m dqn_ale_spaceinvaders.enjoy --artifact runs/dqn_YYYY-MM-DD_HHhMMmSSs/final_model.pt --num-episodes 5
```

### Exporting to Hugging Face Hub

Share your trained model:

```bash
python -m export \
    --username YOUR_HF_USERNAME \
    --repo-name dqn-spaceinvaders \
    --artifact-path runs/dqn_YYYY-MM-DD_HHhMMmSSs/final_model.pt \
    --movie-fps 12
```

This will:
- Create a repository on Hugging Face Hub.
- Upload the model weights, configuration, and evaluation results.
- Generate and upload a replay movie.
- Create a model card with usage instructions.

## Algorithm Details

### DQN Architecture

The network consists of:
- 3 convolutional layers for feature extraction.
- 2 fully connected layers for Q-value estimation.
- Input: 4 stacked 84×84 grayscale frames.
- Output: Q-values for each action.

### Dueling DQN Architecture

Separates state value and action advantages:
- Shared convolutional backbone.
- Value stream: estimates state value V(s).
- Advantage stream: estimates action advantages A(s,a).
- Q(s,a) = V(s) + (A(s,a) - mean(A(s,a))).

### Training Process

1. **Warmup**: Random exploration for initial experiences.
2. **Epsilon Annealing**: Gradual reduction from exploration to exploitation.
3. **Experience Replay**: Sample random mini-batches from replay buffer.
4. **Target Network**: Separate network updated periodically for stability.
5. **Double DQN** (optional): Reduces overestimation by decoupling action selection and evaluation.

## License

This project is available under the MIT License.

