π» SysAdmin Networkο
The Multi-agent SysAdmin problem is a widely used benchmark in the study of decision-making over networked systems. Originally introduced by Guestrin et al. in 2002 as a single-agent factored MDP benchmark [Guestrin et al.], it was later extended into a multi-agent formulation to evaluate coordinated reinforcement learning algorithms [Guestrin et al.]. Over the years, it has remained a standard reference in the field, with recent works reaffirming its relevance [Bargiacchi et al.], [Bianchi et al.].
This environment provides a modern and open-source implementation of the multi-agent version of the SysAdmin problem, specifically designed for multi-agent reinforcement learning (MARL). It maintains the original intent of testing structure-aware planning and coordination under uncertainty, while ensuring compatibility with modern MARL libraries.
Environment Descriptionο
The environment models a network of computers performing tasks. Each computer (agent) can be in one of several health states and may also be processing a task. Over time, machines may become faulty, slowing down task completion, or dead, making the task impossible to complete. Faults can spread probabilistically to neighboring computers. At each timestep, agents may choose to reboot their machine, which resets it to a working state (good) with high probability, but also discards progress on any active task.
This decentralized setting models a Partially Observable Markov Decision Process (PoMDP), where coordination between agents is critical for maintaining system-wide performance and limiting fault propagation.
Illustration for one state update in the SysAdmin Network problem.ο
Graph Topologyο
As in the Binary Consensus environment, the underlying graph defines the network topology. It determines which agents (computers) are neighbors and hence how faults can spread between them.
State Spaceο
Each agentβs state is defined by two categorical variables:
Health status: one of good, faulty, or dead
Task status: one of idle, loaded, or successful
Formally, the global joint state at time \(t\) is an element of:
This results in a total state space size of \(9^N\).
Action Spaceο
At each timestep, every agent selects one of two discrete actions:
Do nothing (continue current operation)
Reboot the machine (resets health to good with high probability but loses current task progress)
Objectiveο
The overall goal is to maximize the number of successfully completed tasks over time. This objective can be framed as either a finite-horizon or infinite-horizon cumulative reward problem. Performance is closely tied to how effectively agents collaborate and leverage the graph structure to mitigate cascading faults.
By modeling local and global trade-offs in a structured environment, the Multi-agent SysAdmin problem serves as an effective testbed for evaluating decentralized and semi-centralized MARL strategies under uncertainty and partial observability.
Environmentο
- class cognac.env.SysAdmin.env.SysAdminNetworkEnvironment(adjacency_matrix: ~numpy.ndarray, max_steps: int = 100, show_neighborhood_state: bool = True, reward_class: ~cognac.core.BaseReward.BaseReward = <class 'cognac.env.SysAdmin.rewards.SysAdminDefaultReward'>, is_global_reward: bool = False, base_arrival_rate: float = 0.5, base_fail_rate: float = 0.1, dead_rate_multiplier: float = 0.2, base_success_rate: float = 0.3, faulty_success_rate: float = 0.1)ο
Bases:
ParallelEnvMulti-agent environment simulating a network of machines managed by agents, based on the βSysAdminβ problem setting.
Each agent controls one machine, which can be in states representing its operational condition (good, faulty, dead) and job state (idle, loaded, successful). Machines can influence each otherβs failure rates based on a given network adjacency matrix.
The environment follows a discrete timestep progression with reboot actions, task completions, faults propagation, and rewards computed per step.
Parametersο
- adjacency_matrixnp.ndarray
Square matrix of shape (n_agents, n_agents) representing network connections. Entry (i, j) indicates influence of agent j on agent i. Must have zeros on diagonal initially (no self influence).
- max_stepsint, default=100
Maximum number of timesteps before the environment is terminated.
- show_neighborhood_statebool, default=True
If True, agents observe not only their own state but also their neighborsβ states.
- reward_classBaseReward subclass, default=SysAdminDefaultReward
Class to compute the rewards. Should be derived from BaseReward.
- is_global_rewardbool, default=False
If True, a single global reward is returned to all agents.
- base_arrival_ratefloat, default=0.5
Probability of new job arriving for an available machine at each step.
- base_fail_ratefloat, default=0.1
Base failure rate for machines without external influence.
- dead_rate_multiplierfloat, default=0.2
Multiplier for probability of machine becoming dead influenced by faulty neighbors.
- base_success_ratefloat, default=0.3
Probability that a working loaded machine completes its task successfully.
- faulty_success_ratefloat, default=0.1
Probability that a faulty loaded machine completes its task successfully.
Attributesο
- adjacency_matrixnp.ndarray
Original adjacency matrix representing the network structure.
- adjacency_matrix_probnp.ndarray
Processed stochastic matrix of influence probabilities with self-failures included.
- n_agentsint
Number of agents (machines) in the environment.
- possible_agentslist of int
List of all agent IDs.
- statenp.ndarray
Current state array of shape (n_agents, 2) with machine status and job state.
- timestepint
Current timestep counter.
- max_stepsint
Maximum allowed timesteps before termination.
- rewardBaseReward
Instance of the reward class used to calculate rewards.
- is_global_rewardbool
Flag indicating if rewards are global or individual.
- neighboring_masksnp.ndarray (bool)
Boolean mask matrix indicating which agents observe each other.
Methodsο
- reset(seed=None, options=None)
Reset environment state and return initial observations.
- step(actions)
Perform one timestep given agentsβ actions; update state and return results.
- get_obs()
Get the current observations dictionary for all agents.
- render()
Print the current state of the environment (to be implemented).
- observation_space(agent)
Return the observation space object for the specified agent.
- action_space(agent)
Return the action space object for the specified agent.
Warning
Methods prefixed with an underscore (_) are for internal use only and should not be called directly by users.
- _available_mask()ο
Warning
Internal use.
Boolean mask for machines that are available to receive new jobs (working and currently idle).
Returnsο
- np.ndarray
Boolean array where True indicates available machines.
- _check_adjacency_matrix()ο
Warning
Internal use.
Validates the adjacency matrix to ensure: - Diagonal elements are zero (no self influence initially). - All entries are probabilities in [0, 1].
Raisesο
- AssertionError
If any of the validation checks fail.
- _done_mask()ο
Warning
Internal use.
Boolean mask for machines that have completed their jobs.
Returnsο
- np.ndarray
Boolean array where True indicates machines done with their tasks.
- _faulty_loaded_mask()ο
Warning
Internal use.
Boolean mask for machines that are faulty and currently loaded with a job.
Returnsο
- np.ndarray
Boolean array where True indicates faulty and loaded machines.
- _faulty_working_mask()ο
Warning
Internal use.
Boolean mask for machines currently in faulty or dead state.
Returnsο
- np.ndarray
Boolean array where True indicates machine is faulty or dead (state code != 0).
- _scale_adjacency_matrix()ο
Warning
Internal use.
Scales the adjacency_matrix_prob so that the maximum row or column sum is at most 1. This ensures the resulting matrix can be interpreted as a stochastic influence matrix.
- _working_loaded_mask()ο
Warning
Internal use.
Boolean mask for machines that are working and currently loaded with a job.
Returnsο
- np.ndarray
Boolean array where True indicates working and loaded machines.
- _working_mask()ο
Warning
Internal use.
Boolean mask for machines currently in working state.
Returnsο
- np.ndarray
Boolean array where True indicates the machine is working (state code 0).
- action_space(agent) Discreteο
Return the action space for the given agent.
Actions are discrete:
0: do nothing
1: reboot machine
Parametersο
- agentint
Agent identifier.
Returnsο
- gymnasium.spaces.Discrete
Action space for the agent.
- get_obs() dictο
Retrieve the current observation for each agent.
Observations consist of each agentβs own machine state vector.
Returnsο
- dict
Mapping from agent IDs to their observations (np.ndarray of shape (2,)).
- metadata: dict[str, Any] = {'name': 'sysadmin_environment_v0'}ο
- observation_space(agent) MultiDiscreteο
Return the observation space for the given agent.
The observation space is a MultiDiscrete space describing machine status and job state with discrete values:
Machine status: 0=good, 1=faulty, 2=dead
Job state: 0=idle, 1=loaded, 2=successful
Parametersο
- agentint
Agent identifier.
Returnsο
- gymnasium.spaces.MultiDiscrete
Observation space for the agent.
- render()ο
Render the current environment state.
Currently prints the raw state array.
Notesο
This method is a placeholder and should be implemented to provide a graphical or structured visualization of the environment.
- reset(seed=None, options=None)ο
Reset the environment to its initial state.
Parametersο
- seedint or None, optional
Seed for the random number generator for reproducibility.
- optionsdict or None, optional
Additional options for environment reset (currently unused).
Returnsο
- observationsdict
Dictionary mapping agent IDs to their initial observations.
- infosdict
Dictionary mapping agent IDs to info dictionaries (empty in this implementation).
- state()ο
Returns the state.
State returns a global view of the environment appropriate for centralized training decentralized execution methods like QMIX
- step(actions)ο
Advance the environment by one timestep given agentsβ actions.
Parametersο
- actionsdict
Dictionary mapping agent IDs to their actions. Each action is an integer: 0 for βdo nothingβ, 1 for βrebootβ.
Returnsο
- observationsdict
Dictionary mapping agent IDs to their new observations.
- rewardsdict
Dictionary mapping agent IDs to their rewards for this step.
- terminationsdict
Dictionary mapping agent IDs to termination flags (bool).
- truncationsdict
Dictionary mapping agent IDs to truncation flags (bool).
- infosdict
Dictionary mapping agent IDs to info dictionaries (empty in this implementation).
Notesο
Rebooted machines are reset to working and idle state.
Machines working on tasks may complete successfully based on probabilities.
- Faulty and dead states evolve probabilistically influenced by network
neighbors.
Rewards are computed via the configured reward class.
Environment terminates after max_steps.
Rewardsο
- class cognac.env.SysAdmin.rewards.SysAdminDefaultRewardο
Bases:
BaseReward- _abc_impl = <_abc._abc_data object>ο