Metadata-Version: 2.4
Name: hitori-gym
Version: 0.0.3
Summary: A Gymnasium environment for the Hitori puzzle game.
Project-URL: Homepage, https://github.com/vibhuagarwal/hitori-gym
Project-URL: Repository, https://github.com/vibhuagarwal/hitori-gym
Author: Vibhu Agarwal
License: MIT
License-File: LICENSE
Keywords: game,gymnasium,hitori,puzzle,reinforcement-learning,rl
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.12
Requires-Dist: gymnasium
Requires-Dist: pygame
Description-Content-Type: text/markdown

# Hitori Gym 🧩

A [Gymnasium](https://gymnasium.farama.org/) environment for the Japanese puzzle game [Hitori](https://en.wikipedia.org/wiki/Hitori).

This environment is specifically designed to train **Maskable Reinforcement Learning** agents (like [MaskablePPO](https://sb3-contrib.readthedocs.io/en/master/modules/ppo_mask.html)), leveraging a dynamic action mask to prevent illegal moves and dramatically simplify the learning process.

## 🚀 Installation

```bash
pip install hitori-gym
```

## 🎮 Usage
Here is a simple example of how to use the Hitori environment with a random agent that respects the action mask.

```python
import gymnasium as gym
import hitori_env
import numpy as np

# Create the Hitori environment
env = gym.make("hitori_env/Hitori-v2", size=5, render_mode="human")

# Reset the environment to get the initial observation
# You can also pass a seed for reproducibility and log the solution for debugging
observation, info = env.reset(seed=42, options={"log_solution": True})

# Run the environment for a certain number of steps
for step in range(1000):
    # Render the environment
    env.render()
    
    # --- CRITICAL: Use the action_mask to find valid actions ---
    action_mask = env.unwrapped.action_masks()
    valid_actions = np.where(action_mask == 1)[0]
    
    # Check if the agent is stuck (no valid moves left)
    if len(valid_actions) == 0:
        print("Agent is stuck! Game Over (Fail).")
        terminated = True
    else:
        # Choose a random valid action
        action = np.random.choice(valid_actions)

        # Take a step in the environment
        observation, reward, terminated, truncated, info = env.step(action)
        
        if terminated and reward > 0:
            print(f"Game Solved in {step + 1} steps!")

    # If the episode is over, reset the environment
    if terminated or truncated:
        observation, info = env.reset()

env.close()
```

## 🤔 The Hitori Puzzle

Hitori is a logic puzzle played on a grid of numbers. The goal is to shade cells according to three rules:

1.  **No Duplicates in Unshaded Cells**: In each row and column, every unshaded number must be unique.
2.  **No Adjacent Shaded Cells**: Shaded cells cannot be adjacent to each other (horizontally or vertically).
3.  **All Unshaded Cells Must Be Connected**: The unshaded cells must form a single, continuous area.

The puzzle is solved when all three conditions are met.

## 💡 Why Maskable Reinforcement Learning?

Hitori is a perfect use case for maskable RL agents. At any given step, the vast majority of actions (shading a cell) are illegal.

-   **Massive Search Space**: For a 5x5 grid, there are 25 possible actions, but often only a few are valid. A standard RL agent would waste an enormous amount of time learning to avoid illegal moves.
-   **Complex Rules**: The rules for what makes a move illegal are complex and depend on the global state of the board.

This environment solves that problem by providing an **action mask** on every step. The agent can use this mask to "see" only the valid moves, pruning the decision tree and making learning dramatically more efficient.

### Action Masking Logic

The `action_mask` is a binary vector where a `1` indicates a valid move. An action (shading a cell) is considered **illegal** if it violates any of the following core Hitori rules:

1.  **Cell Already Shaded**: The cell is already shaded.
2.  **Creates Adjacent Shading**: Shading the cell would place it next to an already shaded cell.
3.  **Disconnects Unshaded Cells**: Shading the cell would split the group of unshaded cells into two or more separate regions (i.e., it's an [articulation point](https://en.wikipedia.org/wiki/Biconnected_component)).
4.  **Cannot Shade an Already-Unique Number**: A cell cannot be shaded if its number is already the only one of its kind in its row and the only one of its kind in its column. Such a number can never be a "duplicate," so there is no reason to shade it.

By enforcing these rules, the environment guarantees that the agent can only take valid steps toward a solution.

## 🕹️ Demo

Here is a demonstration of a Hitori game in this environment:

![Hitori Gym Demo](media/hitori_gym_gif_video.gif)

The repository includes a `playground.py` script that allows you to manually play the Hitori game. This script is not part of the packaged library but is useful for testing and understanding the game mechanics.

To use it, run the following command:

```bash
python playground.py
```

## 🔍 Environment Details

### Observation Space

The observation space is a dictionary containing the puzzle state:

-   `game_grid`: An `NxN` grid representing the puzzle board, with each cell containing a number from 1 to `N`.
-   `shaded`: An `NxN` binary grid indicating which cells are currently shaded (1 for shaded, 0 for unshaded).

```python
spaces.Dict({
    "game_grid": spaces.Box(low=1, high=self.size, shape=(size, size), dtype=np.uint32),
    "shaded": spaces.MultiBinary((size, size)),
})
```

### Action Space

The action space is a `Discrete` space of size `N*N`, where each action corresponds to shading a cell in row-major order. **The agent should only select actions where the `action_mask` is 1.**

```python
spaces.Discrete(size * size)
```

### Rewards

The reward structure is designed to be simple and effective, especially since illegal moves are prevented by the mask.

| Outcome                   | Reward | Description                                                 | Termination |
| ------------------------- | ------ | ----------------------------------------------------------- | ----------- |
| **Win (Puzzle Solved)**   | `+1.0` | The current state is a complete and valid solution.         | `True`      |
| **Stuck (No valid moves)**| `-1.0` | The agent has no valid moves left and has not won.          | `True`      |
| **Valid Step Taken**      | `-0.01`| A small penalty to encourage finding the shortest solution. | `False`     |


## ⚙️ How It Works

The environment is built with a few key components:

-   **`hitori.py`**: The main `gym.Env` implementation. It handles the game logic, state transitions, rendering, and—most importantly—the dynamic generation of the `action_mask` on every step.
-   **`hitori_generator.py`**: A utility that generates valid, solvable Hitori puzzles of a given size.
-   **`hitori_solution.py`**: A backtracking solver that can find a valid solution for a given Hitori puzzle. This is used internally for debugging and can be enabled via an option in `env.reset()`.

## 💻 Development

To set up the project for development, clone the repository and install it in editable mode:

```bash
git clone https://github.com/your-username/hitori-gym.git
cd hitori-gym
pip install -e .
```

## 📄 License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
