πŸ”₯ Firefighting Graph

The Fire Fighting Graph environment is a stylized Decentralized Partially Observable Markov Decision Process (Dec-POMDP) originally introduced by Oliehoek and Amato [Oliehoek et al.], [Oliehoek et al.]. It serves as a classical benchmark for studying decision-making over Dynamic Bayesian Networks (DBNs) in cooperative multi-agent settings. This environment can model real-world domains such as epidemic control [Hu et al.], communication networks [Ooi and Wornell], [Mahajan and Mannan], and traffic systems [Wu et al.].

In the base version of the problem, a line of \(N+1\) houses is protected by \(N\) agents (firefighters), each responsible for extinguishing fires at two neighboring locations. The underlying structure can be represented as a bipartite graph, with agent nodes and house nodes.

This package implements two variants:

  • The original 1D version with \(N\) agents and \(N+1\) houses.

  • A novel 2D grid extension with \(N \times M\) agents and \((N+1) \times (M+1)\) houses.

The following description focuses on the 1D version. The 2D extension is described in the supplementary material.

Environment Description

Illustration for the Firefighting Graph problem (in 1D and 2D).

Illustration for the Firefighting Graph problem (in 1D and 2D).

State and Observation

The system state at time \(t\) is the fire level at each house:

\[S(t) = \{ s_h(t) \}_{h=1}^{N+1}, \quad \text{where} \quad s_h(t) \in [0, \theta)\]

Agents do not have access to the full system state. Each agent only receives a stochastic observation of the fire level at the house they are currently monitoring. The observation accuracy depends on the actual fire level, introducing uncertainty and partial observability into the problem.

By default (following [Oliehoek et al.]):

  • If a house is not burning (fire level 0), fire is observed with probability 0.2.

  • If the fire level is low (1), fire is observed with probability 0.5.

  • At higher fire levels, fire is observed with probability 0.8.

The environment also maintains internal information about agent locations, which influence observations and rewards.

Action Space

Each agent may move to either of its two designated housesβ€”house \(i\) or \(i+1\)β€”and attempt to extinguish the fire.

In the 1D setting:

  • Two agents at the same house extinguish all fire with high probability (default = 1.0).

  • A single agent can reduce the fire level by one unit: - With high probability if adjacent houses are not burning. - With reduced probability (default = 0.6) if fire is present in neighboring houses.

In the 2D version, agents can move to any of the four adjacent houses (up, down, left, right).

Reward and Objective

Episodes end when all houses are extinguished (i.e., fire level is 0 everywhere), or after reaching a maximum time horizon \(T\).

Each agent receives a negative reward based on the fire level of the house it is currently assigned to. Formally:

\[\begin{split}r_i(t) = \begin{cases} -s_i(t) & \text{if agent $i$ is at house $i$} \\ -s_{i+1}(t) & \text{if agent $i$ is at house $i+1$} \end{cases}\end{split}\]

The agents are fully cooperative and must learn to coordinate their actions and positions to extinguish fires as quickly and efficiently as possible.

Summary

This environment provides a simple yet expressive testbed for decentralized cooperation under uncertainty. Its modular structure and 2D extension make it a versatile benchmark for multi-agent reinforcement learning research, especially in domains requiring coordination on dynamic spatial networks.

Environment

class cognac.env.FirefightingGraph.env.GridFireFightingGraphEnvironment(n_width: int = 5, n_height: int = 5, max_steps: int = 100, max_fire_level: int = 100, reward_class: type = <class 'cognac.env.FirefightingGraph.rewards.DefaultFFGReward'>, is_global_reward: bool = False)

Bases: ParallelEnv

A grid-based multi-agent fire fighting environment.

Each agent controls a cell in a grid and must cooperate with others to suppress fires in a shared neighborhood. Fire spread and extinguishing dynamics depend on spatial proximity and the actions of neighboring agents.

Parameters

n_widthint

Grid width.

n_heightint

Grid height.

max_stepsint

Maximum number of environment steps.

max_fire_levelint

Maximum fire level a house can reach.

reward_classtype, optional

Reward function class to compute rewards.

is_global_rewardbool, optional

Whether to use a shared global reward or individual rewards.

Attributes

n_widthint

Grid width.

n_heightint

Grid height.

n_agentsint

Total number of agents.

n_housesint

Total number of houses.

max_stepsint

Maximum number of environment steps.

max_fire_levelint

Maximum fire level a house can reach.

rewardDefaultFFGReward

Reward function instance.

is_global_rewardbool

Whether to use a shared global reward.

_act_id_to_house_pos(agent: tuple, act_id: int) tuple

Map an action ID to the house position relative to an agent.

Warning

Internal mask method use for the step dynamics.

Parameters

agenttuple

Agent’s grid position.

act_idint

Action identifier.

Returns

tuple

The corresponding house position in the grid.

_get_visit_map(joint_act: dict) ndarray
_is_burning() ndarray

Identify which houses are currently burning.

Warning

Internal mask method use for the step dynamics.

Returns

np.ndarray

Boolean array where True indicates a burning house.

_is_burning_neighbors(state: ndarray) ndarray

Determine which cells in a 2D grid have burning neighbors.

Parameters

statenp.ndarray

2D array of fire levels.

Returns

np.ndarray

Boolean array indicating where at least one neighbor is burning.

_is_burning_neighbors_1d() ndarray

Determine which houses in a 1D row have burning neighbors.

Warning

Internal mask method use for the step dynamics.

Parameters

statenp.ndarray

1D array of fire levels.

Returns

np.ndarray

Boolean array indicating where a neighbor is burning.

action_space(agent: tuple) Discrete

Define the action space for a given agent.

Parameters

agenttuple

Agent identifier.

Returns

gymnasium.spaces.Discrete

Action space of the agent (4 discrete actions).

get_obs() dict

Compute the number of visits each house receives from all agents.

Parameters

joint_actdict

Mapping from agent positions to their chosen action.

Returns

np.ndarray

A 2D array indicating the number of visits per house.

metadata: dict[str, Any] = {'name': 'grid_firefighting_environment_v0'}
observation_space(agent: tuple) Discrete

Define the observation space for a given agent.

Parameters

agenttuple

Agent identifier.

Returns

gymnasium.spaces.Discrete

Observation space of the agent (binary outcomes).

render(save_frame: bool = False, fig=None, ax=None) None

Render the current state of the environment.

Parameters

save_framebool, optional

Whether to save the current frame as an image. Default is False.

figmatplotlib.figure.Figure, optional

Figure object to use for rendering. If None, a new figure is created.

axmatplotlib.axes.Axes, optional

Axes object to render on. If None, rendering occurs on the current axes.

reset(seed: int = None, options: dict = None) tuple

Resets the grid environment to an initial state.

Parameters

seedint, optional

Random seed.

optionsdict, optional

Initialization options including optional initial fire states under β€œinit_vect”.

Returns

tuple

Observations and additional information for each agent.

step(actions: dict[tuple, int]) tuple

Executes a step in the grid-based environment.

Parameters

actionsdict of tuple to int

Mapping from (row, col) agent indices to actions.

Returns

tuple

Updated observations, rewards, terminations, truncations, and info dictionaries.

class cognac.env.FirefightingGraph.env.RowFireFightingGraphEnvironment(n: int = 10, max_steps: int = 100, max_fire_level: int = 100, reward_class: type = <class 'cognac.env.FirefightingGraph.rewards.DefaultFFGReward'>, is_global_reward: bool = False)

Bases: ParallelEnv

A row-based multi-agent fire fighting environment.

Each agent is responsible for extinguishing fires in adjacent houses. Fire levels increase probabilistically based on neighboring fires and the presence of firefighters.

Parameters

n_agentsint

Number of agents.

max_stepsint

Maximum number of environment steps.

max_fire_levelint

Maximum fire level a house can reach.

reward_classtype, optional

Reward function class to compute rewards.

is_global_rewardbool, optional

Whether to use a shared global reward or individual rewards.

Attributes

n_agentsint

Number of agents.

n_housesint

Number of houses (n_agents + 1).

max_stepsint

Maximum number of environment steps.

max_fire_levelint

Maximum fire level a house can reach.

rewardDefaultFFGReward

Reward function instance.

is_global_rewardbool

Whether to use a shared global reward.

_is_burning_neighbors(house: int, state: ndarray) bool
action_space(agent: int) Discrete

Returns the action space of a given agent.

Parameters

agentint

Index of the agent.

Returns

Discrete

Action space with 2 actions (0 or 1).

get_obs() dict

Computes the observable flame status for each agent’s current location.

Returns

dict

Observations per agent as binary flame detection (0 or 1).

metadata: dict[str, Any] = {'name': 'row_firefighting_environment_v0'}
observation_space(agent: int) Discrete

Returns the observation space of a given agent.

Parameters

agentint

Index of the agent.

Returns

Discrete

Observation space with binary outcome (0 or 1).

render(save_frame: bool = False, fig=None, ax=None) None

Renders the current fire level and agent positions on the graph.

Parameters

save_framebool, optional

If True, saves the rendered frame.

figFigure, optional

Matplotlib figure object.

axAxes, optional

Matplotlib axes object.

reset(seed: int = None, options: dict = None) tuple

Resets the environment to an initial state.

Parameters

seedint, optional

Random seed for reproducibility.

optionsdict, optional

Initialization options, including an optional β€œinit_vect” to specify initial fire levels.

Returns

tuple

A dictionary of agent observations and a dictionary of additional info for each agent.

step(actions: dict) tuple

Executes a single step of environment dynamics.

Parameters

actionsdict

A mapping from agent identifiers to actions (0 to stay, 1 to move right).

Returns

tuple

A 5-tuple containing: - observations (dict): Updated observations. - rewards (dict): Rewards for each agent. - terminations (dict): Episode termination status. - truncations (dict): Episode truncation status. - infos (dict): Additional per-agent information.

Rewards

class cognac.env.FirefightingGraph.rewards.DefaultFFGReward(max_reward: float = 0.0, min_reward: float = 0.0)

Bases: BaseReward

Default reward function for the Fire Fighting Graph environment.

This reward penalizes agents based on the fire level at the house they last visited. The reward is negative proportional to the fire intensity, encouraging agents to extinguish fires efficiently.

Parameters

max_rewardfloat, optional

Maximum reward value, by default 0.0. (Currently unused in calculation but reserved for potential future use.)

min_rewardfloat, optional

Minimum reward value, by default 0.0. (Currently unused in calculation but reserved for potential future use.)

Attributes

max_rewardfloat

The maximum reward possible.

min_rewardfloat

The minimum reward possible.

_abc_impl = <_abc._abc_data object>