π₯ Firefighting Graphο
The Fire Fighting Graph environment is a stylized Decentralized Partially Observable Markov Decision Process (Dec-POMDP) originally introduced by Oliehoek and Amato [Oliehoek et al.], [Oliehoek et al.]. It serves as a classical benchmark for studying decision-making over Dynamic Bayesian Networks (DBNs) in cooperative multi-agent settings. This environment can model real-world domains such as epidemic control [Hu et al.], communication networks [Ooi and Wornell], [Mahajan and Mannan], and traffic systems [Wu et al.].
In the base version of the problem, a line of \(N+1\) houses is protected by \(N\) agents (firefighters), each responsible for extinguishing fires at two neighboring locations. The underlying structure can be represented as a bipartite graph, with agent nodes and house nodes.
This package implements two variants:
The original 1D version with \(N\) agents and \(N+1\) houses.
A novel 2D grid extension with \(N \times M\) agents and \((N+1) \times (M+1)\) houses.
The following description focuses on the 1D version. The 2D extension is described in the supplementary material.
Environment Descriptionο
Illustration for the Firefighting Graph problem (in 1D and 2D).ο
State and Observationο
The system state at time \(t\) is the fire level at each house:
Agents do not have access to the full system state. Each agent only receives a stochastic observation of the fire level at the house they are currently monitoring. The observation accuracy depends on the actual fire level, introducing uncertainty and partial observability into the problem.
By default (following [Oliehoek et al.]):
If a house is not burning (fire level 0), fire is observed with probability 0.2.
If the fire level is low (1), fire is observed with probability 0.5.
At higher fire levels, fire is observed with probability 0.8.
The environment also maintains internal information about agent locations, which influence observations and rewards.
Action Spaceο
Each agent may move to either of its two designated housesβhouse \(i\) or \(i+1\)βand attempt to extinguish the fire.
In the 1D setting:
Two agents at the same house extinguish all fire with high probability (default = 1.0).
A single agent can reduce the fire level by one unit: - With high probability if adjacent houses are not burning. - With reduced probability (default = 0.6) if fire is present in neighboring houses.
In the 2D version, agents can move to any of the four adjacent houses (up, down, left, right).
Reward and Objectiveο
Episodes end when all houses are extinguished (i.e., fire level is 0 everywhere), or after reaching a maximum time horizon \(T\).
Each agent receives a negative reward based on the fire level of the house it is currently assigned to. Formally:
The agents are fully cooperative and must learn to coordinate their actions and positions to extinguish fires as quickly and efficiently as possible.
Summaryο
This environment provides a simple yet expressive testbed for decentralized cooperation under uncertainty. Its modular structure and 2D extension make it a versatile benchmark for multi-agent reinforcement learning research, especially in domains requiring coordination on dynamic spatial networks.
Environmentο
- class cognac.env.FirefightingGraph.env.GridFireFightingGraphEnvironment(n_width: int = 5, n_height: int = 5, max_steps: int = 100, max_fire_level: int = 100, reward_class: type = <class 'cognac.env.FirefightingGraph.rewards.DefaultFFGReward'>, is_global_reward: bool = False)ο
Bases:
ParallelEnvA grid-based multi-agent fire fighting environment.
Each agent controls a cell in a grid and must cooperate with others to suppress fires in a shared neighborhood. Fire spread and extinguishing dynamics depend on spatial proximity and the actions of neighboring agents.
Parametersο
- n_widthint
Grid width.
- n_heightint
Grid height.
- max_stepsint
Maximum number of environment steps.
- max_fire_levelint
Maximum fire level a house can reach.
- reward_classtype, optional
Reward function class to compute rewards.
- is_global_rewardbool, optional
Whether to use a shared global reward or individual rewards.
Attributesο
- n_widthint
Grid width.
- n_heightint
Grid height.
- n_agentsint
Total number of agents.
- n_housesint
Total number of houses.
- max_stepsint
Maximum number of environment steps.
- max_fire_levelint
Maximum fire level a house can reach.
- rewardDefaultFFGReward
Reward function instance.
- is_global_rewardbool
Whether to use a shared global reward.
- _act_id_to_house_pos(agent: tuple, act_id: int) tupleο
Map an action ID to the house position relative to an agent.
Warning
Internal mask method use for the step dynamics.
Parametersο
- agenttuple
Agentβs grid position.
- act_idint
Action identifier.
Returnsο
- tuple
The corresponding house position in the grid.
- _get_visit_map(joint_act: dict) ndarrayο
- _is_burning() ndarrayο
Identify which houses are currently burning.
Warning
Internal mask method use for the step dynamics.
Returnsο
- np.ndarray
Boolean array where True indicates a burning house.
- _is_burning_neighbors(state: ndarray) ndarrayο
Determine which cells in a 2D grid have burning neighbors.
Parametersο
- statenp.ndarray
2D array of fire levels.
Returnsο
- np.ndarray
Boolean array indicating where at least one neighbor is burning.
- _is_burning_neighbors_1d() ndarrayο
Determine which houses in a 1D row have burning neighbors.
Warning
Internal mask method use for the step dynamics.
Parametersο
- statenp.ndarray
1D array of fire levels.
Returnsο
- np.ndarray
Boolean array indicating where a neighbor is burning.
- action_space(agent: tuple) Discreteο
Define the action space for a given agent.
Parametersο
- agenttuple
Agent identifier.
Returnsο
- gymnasium.spaces.Discrete
Action space of the agent (4 discrete actions).
- get_obs() dictο
Compute the number of visits each house receives from all agents.
Parametersο
- joint_actdict
Mapping from agent positions to their chosen action.
Returnsο
- np.ndarray
A 2D array indicating the number of visits per house.
- metadata: dict[str, Any] = {'name': 'grid_firefighting_environment_v0'}ο
- observation_space(agent: tuple) Discreteο
Define the observation space for a given agent.
Parametersο
- agenttuple
Agent identifier.
Returnsο
- gymnasium.spaces.Discrete
Observation space of the agent (binary outcomes).
- render(save_frame: bool = False, fig=None, ax=None) Noneο
Render the current state of the environment.
Parametersο
- save_framebool, optional
Whether to save the current frame as an image. Default is False.
- figmatplotlib.figure.Figure, optional
Figure object to use for rendering. If None, a new figure is created.
- axmatplotlib.axes.Axes, optional
Axes object to render on. If None, rendering occurs on the current axes.
- reset(seed: int = None, options: dict = None) tupleο
Resets the grid environment to an initial state.
Parametersο
- seedint, optional
Random seed.
- optionsdict, optional
Initialization options including optional initial fire states under βinit_vectβ.
Returnsο
- tuple
Observations and additional information for each agent.
- class cognac.env.FirefightingGraph.env.RowFireFightingGraphEnvironment(n: int = 10, max_steps: int = 100, max_fire_level: int = 100, reward_class: type = <class 'cognac.env.FirefightingGraph.rewards.DefaultFFGReward'>, is_global_reward: bool = False)ο
Bases:
ParallelEnvA row-based multi-agent fire fighting environment.
Each agent is responsible for extinguishing fires in adjacent houses. Fire levels increase probabilistically based on neighboring fires and the presence of firefighters.
Parametersο
- n_agentsint
Number of agents.
- max_stepsint
Maximum number of environment steps.
- max_fire_levelint
Maximum fire level a house can reach.
- reward_classtype, optional
Reward function class to compute rewards.
- is_global_rewardbool, optional
Whether to use a shared global reward or individual rewards.
Attributesο
- n_agentsint
Number of agents.
- n_housesint
Number of houses (n_agents + 1).
- max_stepsint
Maximum number of environment steps.
- max_fire_levelint
Maximum fire level a house can reach.
- rewardDefaultFFGReward
Reward function instance.
- is_global_rewardbool
Whether to use a shared global reward.
- _is_burning_neighbors(house: int, state: ndarray) boolο
- action_space(agent: int) Discreteο
Returns the action space of a given agent.
Parametersο
- agentint
Index of the agent.
Returnsο
- Discrete
Action space with 2 actions (0 or 1).
- get_obs() dictο
Computes the observable flame status for each agentβs current location.
Returnsο
- dict
Observations per agent as binary flame detection (0 or 1).
- metadata: dict[str, Any] = {'name': 'row_firefighting_environment_v0'}ο
- observation_space(agent: int) Discreteο
Returns the observation space of a given agent.
Parametersο
- agentint
Index of the agent.
Returnsο
- Discrete
Observation space with binary outcome (0 or 1).
- render(save_frame: bool = False, fig=None, ax=None) Noneο
Renders the current fire level and agent positions on the graph.
Parametersο
- save_framebool, optional
If True, saves the rendered frame.
- figFigure, optional
Matplotlib figure object.
- axAxes, optional
Matplotlib axes object.
- reset(seed: int = None, options: dict = None) tupleο
Resets the environment to an initial state.
Parametersο
- seedint, optional
Random seed for reproducibility.
- optionsdict, optional
Initialization options, including an optional βinit_vectβ to specify initial fire levels.
Returnsο
- tuple
A dictionary of agent observations and a dictionary of additional info for each agent.
- step(actions: dict) tupleο
Executes a single step of environment dynamics.
Parametersο
- actionsdict
A mapping from agent identifiers to actions (0 to stay, 1 to move right).
Returnsο
- tuple
A 5-tuple containing: - observations (dict): Updated observations. - rewards (dict): Rewards for each agent. - terminations (dict): Episode termination status. - truncations (dict): Episode truncation status. - infos (dict): Additional per-agent information.
Rewardsο
- class cognac.env.FirefightingGraph.rewards.DefaultFFGReward(max_reward: float = 0.0, min_reward: float = 0.0)ο
Bases:
BaseRewardDefault reward function for the Fire Fighting Graph environment.
This reward penalizes agents based on the fire level at the house they last visited. The reward is negative proportional to the fire intensity, encouraging agents to extinguish fires efficiently.
Parametersο
- max_rewardfloat, optional
Maximum reward value, by default 0.0. (Currently unused in calculation but reserved for potential future use.)
- min_rewardfloat, optional
Minimum reward value, by default 0.0. (Currently unused in calculation but reserved for potential future use.)
Attributesο
- max_rewardfloat
The maximum reward possible.
- min_rewardfloat
The minimum reward possible.
- _abc_impl = <_abc._abc_data object>ο