Metadata-Version: 2.1
Name: thants
Version: 0.1.0
Summary: Ants! On your GPU! With JAX!
License: MIT
Keywords: agent-based modelling,simulation,reinforcement-learning
Author: zombie-einstein
Author-email: zombie-einstein@proton.me
Requires-Python: >=3.10,<4.0
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: GPU
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Artificial Life
Requires-Dist: jumanji (>=1.1.1,<2.0.0)
Description-Content-Type: text/markdown

<div align="center">
  <img src="https://github.com/zombie-einstein/thants/raw/main/.github/images/thants.gif" />
  <br>
  <em>Thanks ants!</em>
</div>
<br>

# Thants

*Multi-agent and multi-team reinforcement-learning environment modelling ant foraging*

## Introduction

Thants is a rl environment library based on models of ant colonies foraging for food, also supporting
environments with multiple competing colonies.

Thants is implemented using JAX, allowing the environments to be run on GPU enabling large scale performant
simulation, and the ability to run environments alongside JAX and Pytorch ML tools.

The environment is implemented using the [Jumanji](https://github.com/instadeepai/jumanji) RL environment API, with some modification
for the multi-colony case.

## Usage

### Installation

Thants can be installed from pypi using

```commandline
pip install thants
```

### Examples

#### Single Colony

```python
from thants.mono import ThantsMonoColony
import jax

env = ThantsMonoColony(dims=(50, 50))
key = jax.random.PRNGKey(101)
state, obs = env.reset(key)
state_history = [state]

for _ in range(50):
    key, action_key = jax.random.split(key, 2)
    actions = jax.random.choice(
        action_key, env.num_actions, (env.num_agents,)
    )
    state, obs = env.step(state, actions)
    state_history.append(state)

env.animate(state_history, 100, "mono_colony.gif")
```

#### Multi-Colony

In the multi-colony case each colony is treated independently (and can be
different sizes), so actions and observations are list of arrays/structs.

```python
from thants.multi import ThantsMultiColony
import jax
import jax.numpy as jnp


env = ThantsMultiColony((50, 100))
key = jax.random.PRNGKey(101)
state, obs = env.reset(key)
state_history = [state]

for _ in range(50):
    key, k1, k2 = jax.random.split(key, 3)
    # List of action arrays per colony
    actions = [
        jax.random.choice(k1, env.num_actions, (env.num_agents[0],)),
        jax.random.choice(k2, env.num_actions, (env.num_agents[1],)),
    ]
    state, obs = env.step(state, actions)
    state_history.append(state)

env.animate(state_history, 100, "multi_colony.gif")
```

## Environment

The environment is modelled as a grid, wrapped at the boundaries. Ants (the agents)
occupy individual cells on the grid (and cannot overlap). Ants can pick up, carry,
and deposit food, or deposit persistent signals that can be observed by other ants
in the same colony.

### State

#### Colonies

The state of each individual colony has the following components:

- *Ants*: Individual ants in the colony have several components:
    - *Positions*: 2d indices of ant positions on the environment grid.
    - *Carrying*: The amount of food being carried by each ant.
    - *Health*: Ant health (currently unused).
- *Nest*: 2d array indicating if a cell is designated as a nest for this colony.
- *Signals*: 3d array of signal deposits at each cell. Signals have multiple channels
  to facilitate communication between ants (i.e. the first dimension of the array
  is the signal channel).

#### Environment

The state of the environment then consists of the colony/colonies and state shared
by all the colonies

- *Colony/Colonies*: Colony or list of colony states
- *Food*: 2d array representing the amount of food deposited at each cell
- *Terrain*: Array of flags indicating if a cell can be occupied by an ant. This
  allows obstacles to be placed on the environment.

#### Updates

Each step of the environment performs the following update to the state:

- Convert integer action choices into state updates
- Apply ant position updates
- Apply food pick-up and deposit actions
- Drop any new food deposits
- Dissipate and propagate signals
- Apply signal deposit actions
- Clear any food that has been deposited on a nest (i.e. the food is consumed
  by the colony)

The behaviour of the dynamics of signals can be customised by implementing the
`thants.common.signals.SignalPropagator` base class and passing it when initialising
the environment.

#### Generators

The initialisation of the environment can be customised by implementing the
relevant base class and passing them to the environment.

### Actions

Ants can select from several discrete actions, indicated by an integer value:

- `0`: Null action (i.e. no change to the environment)
- `1 - 4`: Move in one of the four ordinal directions (if possible)
- `5`: Take a fixed amount of food from the ants location (if possible)
- `6`: Deposit a fixed amount of food from the ants location (if possible)
- `7+`: Deposit a fixed amount of signal at the ants location

Note that actions can be selected, but may not be possible e.g. attempting
to move to an occupied cell, or taking food from an empty cell. In this
case there will be no change in state due to the chosen action.

### Observations

Individual agent observations also consist of several components. Observations are
individually made for the local neighbourhood of each ant, i.e. the 8 surrounding
cells on the environment grid, and their own cell:

- `ants`: Flag indicating if a cell in the neighbourhood is occupied by an ant,
  for the single colony environment this has shape `[n-ants, 9]`, but in the multi-colony
  case this is `[n-ants, n-colonies, 9]` where the second dimensions indicates the
  individual colonies.
- `signals`: Signal deposits in the neighbourhood (across all channels), signals are
  observed individually for each colony.
- `food`: Food deposits within the neighbourhood.
- `nest`: Flag indicating if a neighbouring cell is designated as a nest.
- `terrain`: Flag indicating if a neighbouring cell can be occupied.
- `carrying`: Amount of food currently being carried by each ant.

### Rewards

By default, rewards are granted to ants when they deposit food on their colonies nest.
Reward signals customised by implementing the respective base classes
`thants.mono.rewards.RewardFn` or `thants.multi.rewards.RewardFn`.

