πŸ’» SysAdmin Network

The Multi-agent SysAdmin problem is a widely used benchmark in the study of decision-making over networked systems. Originally introduced by Guestrin et al. in 2002 as a single-agent factored MDP benchmark [Guestrin et al.], it was later extended into a multi-agent formulation to evaluate coordinated reinforcement learning algorithms [Guestrin et al.]. Over the years, it has remained a standard reference in the field, with recent works reaffirming its relevance [Bargiacchi et al.], [Bianchi et al.].

This environment provides a modern and open-source implementation of the multi-agent version of the SysAdmin problem, specifically designed for multi-agent reinforcement learning (MARL). It maintains the original intent of testing structure-aware planning and coordination under uncertainty, while ensuring compatibility with modern MARL libraries.

Environment Description

The environment models a network of computers performing tasks. Each computer (agent) can be in one of several health states and may also be processing a task. Over time, machines may become faulty, slowing down task completion, or dead, making the task impossible to complete. Faults can spread probabilistically to neighboring computers. At each timestep, agents may choose to reboot their machine, which resets it to a working state (good) with high probability, but also discards progress on any active task.

This decentralized setting models a Partially Observable Markov Decision Process (PoMDP), where coordination between agents is critical for maintaining system-wide performance and limiting fault propagation.

Illustration for one state update in the SysAdmin Network problem.

Illustration for one state update in the SysAdmin Network problem.

Graph Topology

As in the Binary Consensus environment, the underlying graph defines the network topology. It determines which agents (computers) are neighbors and hence how faults can spread between them.

State Space

Each agent’s state is defined by two categorical variables:

  • Health status: one of good, faulty, or dead

  • Task status: one of idle, loaded, or successful

Formally, the global joint state at time \(t\) is an element of:

\[S(t) \in \{\text{good}, \text{faulty}, \text{dead}\}^N \times \{\text{idle}, \text{loaded}, \text{successful}\}^N\]

This results in a total state space size of \(9^N\).

Action Space

At each timestep, every agent selects one of two discrete actions:

  • Do nothing (continue current operation)

  • Reboot the machine (resets health to good with high probability but loses current task progress)

Objective

The overall goal is to maximize the number of successfully completed tasks over time. This objective can be framed as either a finite-horizon or infinite-horizon cumulative reward problem. Performance is closely tied to how effectively agents collaborate and leverage the graph structure to mitigate cascading faults.

By modeling local and global trade-offs in a structured environment, the Multi-agent SysAdmin problem serves as an effective testbed for evaluating decentralized and semi-centralized MARL strategies under uncertainty and partial observability.

Environment

class cognac.env.SysAdmin.env.SysAdminNetworkEnvironment(adjacency_matrix: ~numpy.ndarray, max_steps: int = 100, show_neighborhood_state: bool = True, reward_class: ~cognac.core.BaseReward.BaseReward = <class 'cognac.env.SysAdmin.rewards.SysAdminDefaultReward'>, is_global_reward: bool = False, base_arrival_rate: float = 0.5, base_fail_rate: float = 0.1, dead_rate_multiplier: float = 0.2, base_success_rate: float = 0.3, faulty_success_rate: float = 0.1)

Bases: ParallelEnv

Multi-agent environment simulating a network of machines managed by agents, based on the β€œSysAdmin” problem setting.

Each agent controls one machine, which can be in states representing its operational condition (good, faulty, dead) and job state (idle, loaded, successful). Machines can influence each other’s failure rates based on a given network adjacency matrix.

The environment follows a discrete timestep progression with reboot actions, task completions, faults propagation, and rewards computed per step.

Parameters

adjacency_matrixnp.ndarray

Square matrix of shape (n_agents, n_agents) representing network connections. Entry (i, j) indicates influence of agent j on agent i. Must have zeros on diagonal initially (no self influence).

max_stepsint, default=100

Maximum number of timesteps before the environment is terminated.

show_neighborhood_statebool, default=True

If True, agents observe not only their own state but also their neighbors’ states.

reward_classBaseReward subclass, default=SysAdminDefaultReward

Class to compute the rewards. Should be derived from BaseReward.

is_global_rewardbool, default=False

If True, a single global reward is returned to all agents.

base_arrival_ratefloat, default=0.5

Probability of new job arriving for an available machine at each step.

base_fail_ratefloat, default=0.1

Base failure rate for machines without external influence.

dead_rate_multiplierfloat, default=0.2

Multiplier for probability of machine becoming dead influenced by faulty neighbors.

base_success_ratefloat, default=0.3

Probability that a working loaded machine completes its task successfully.

faulty_success_ratefloat, default=0.1

Probability that a faulty loaded machine completes its task successfully.

Attributes

adjacency_matrixnp.ndarray

Original adjacency matrix representing the network structure.

adjacency_matrix_probnp.ndarray

Processed stochastic matrix of influence probabilities with self-failures included.

n_agentsint

Number of agents (machines) in the environment.

possible_agentslist of int

List of all agent IDs.

statenp.ndarray

Current state array of shape (n_agents, 2) with machine status and job state.

timestepint

Current timestep counter.

max_stepsint

Maximum allowed timesteps before termination.

rewardBaseReward

Instance of the reward class used to calculate rewards.

is_global_rewardbool

Flag indicating if rewards are global or individual.

neighboring_masksnp.ndarray (bool)

Boolean mask matrix indicating which agents observe each other.

Methods

reset(seed=None, options=None)

Reset environment state and return initial observations.

step(actions)

Perform one timestep given agents’ actions; update state and return results.

get_obs()

Get the current observations dictionary for all agents.

render()

Print the current state of the environment (to be implemented).

observation_space(agent)

Return the observation space object for the specified agent.

action_space(agent)

Return the action space object for the specified agent.

Warning

Methods prefixed with an underscore (_) are for internal use only and should not be called directly by users.

_available_mask()

Warning

Internal use.

Boolean mask for machines that are available to receive new jobs (working and currently idle).

Returns

np.ndarray

Boolean array where True indicates available machines.

_check_adjacency_matrix()

Warning

Internal use.

Validates the adjacency matrix to ensure: - Diagonal elements are zero (no self influence initially). - All entries are probabilities in [0, 1].

Raises

AssertionError

If any of the validation checks fail.

_done_mask()

Warning

Internal use.

Boolean mask for machines that have completed their jobs.

Returns

np.ndarray

Boolean array where True indicates machines done with their tasks.

_faulty_loaded_mask()

Warning

Internal use.

Boolean mask for machines that are faulty and currently loaded with a job.

Returns

np.ndarray

Boolean array where True indicates faulty and loaded machines.

_faulty_working_mask()

Warning

Internal use.

Boolean mask for machines currently in faulty or dead state.

Returns

np.ndarray

Boolean array where True indicates machine is faulty or dead (state code != 0).

_scale_adjacency_matrix()

Warning

Internal use.

Scales the adjacency_matrix_prob so that the maximum row or column sum is at most 1. This ensures the resulting matrix can be interpreted as a stochastic influence matrix.

_working_loaded_mask()

Warning

Internal use.

Boolean mask for machines that are working and currently loaded with a job.

Returns

np.ndarray

Boolean array where True indicates working and loaded machines.

_working_mask()

Warning

Internal use.

Boolean mask for machines currently in working state.

Returns

np.ndarray

Boolean array where True indicates the machine is working (state code 0).

action_space(agent) Discrete

Return the action space for the given agent.

Actions are discrete:

  • 0: do nothing

  • 1: reboot machine

Parameters

agentint

Agent identifier.

Returns

gymnasium.spaces.Discrete

Action space for the agent.

get_obs() dict

Retrieve the current observation for each agent.

Observations consist of each agent’s own machine state vector.

Returns

dict

Mapping from agent IDs to their observations (np.ndarray of shape (2,)).

metadata: dict[str, Any] = {'name': 'sysadmin_environment_v0'}
observation_space(agent) MultiDiscrete

Return the observation space for the given agent.

The observation space is a MultiDiscrete space describing machine status and job state with discrete values:

  • Machine status: 0=good, 1=faulty, 2=dead

  • Job state: 0=idle, 1=loaded, 2=successful

Parameters

agentint

Agent identifier.

Returns

gymnasium.spaces.MultiDiscrete

Observation space for the agent.

render()

Render the current environment state.

Currently prints the raw state array.

Notes

This method is a placeholder and should be implemented to provide a graphical or structured visualization of the environment.

reset(seed=None, options=None)

Reset the environment to its initial state.

Parameters

seedint or None, optional

Seed for the random number generator for reproducibility.

optionsdict or None, optional

Additional options for environment reset (currently unused).

Returns

observationsdict

Dictionary mapping agent IDs to their initial observations.

infosdict

Dictionary mapping agent IDs to info dictionaries (empty in this implementation).

state()

Returns the state.

State returns a global view of the environment appropriate for centralized training decentralized execution methods like QMIX

step(actions)

Advance the environment by one timestep given agents’ actions.

Parameters

actionsdict

Dictionary mapping agent IDs to their actions. Each action is an integer: 0 for β€œdo nothing”, 1 for β€œreboot”.

Returns

observationsdict

Dictionary mapping agent IDs to their new observations.

rewardsdict

Dictionary mapping agent IDs to their rewards for this step.

terminationsdict

Dictionary mapping agent IDs to termination flags (bool).

truncationsdict

Dictionary mapping agent IDs to truncation flags (bool).

infosdict

Dictionary mapping agent IDs to info dictionaries (empty in this implementation).

Notes

  • Rebooted machines are reset to working and idle state.

  • Machines working on tasks may complete successfully based on probabilities.

  • Faulty and dead states evolve probabilistically influenced by network

    neighbors.

  • Rewards are computed via the configured reward class.

  • Environment terminates after max_steps.

Rewards

class cognac.env.SysAdmin.rewards.SysAdminDefaultReward

Bases: BaseReward

_abc_impl = <_abc._abc_data object>