🥃 Welcome to COGNAC’s Documentation
COGNAC (COoperative Graph-based Networked Agent Challenges) is a benchmark suite for evaluating and developing decentralized multi-agent reinforcement learning (MARL) algorithms on cooperative tasks with graph-structured environments.
Real-world systems such as power grids, traffic networks, and computer systems often exhibit complex interdependencies that can naturally be modeled as graphs. Yet, controlling such systems remains challenging due to their scale, partial observability, and combinatorial complexity. Standard single-agent RL struggles to scale in these settings.
COGNAC bridges the gap between theoretical models of network control and empirical reinforcement learning by providing:
A flexible and modular suite of environments with network topology
Support for fully cooperative MARL tasks on arbitrary graph structures
Scalable problems designed to highlight the limitations of centralized control
Tools for testing decentralized, distributed, and frugal AI methods
Motivation
Despite recent advances in MARL, there is a lack of standardized, open-source benchmarks specifically focused on graph-structured control problems. Many existing environments either rely on centralized settings or do not fully exploit the graph structure of the domain.
COGNAC was built to fill this gap by offering: - Minimal yet challenging environments tailored for graph-based cooperation - Compatibility with modern RL libraries and tooling - A platform to test scalability, generalization, and communication protocols
Key Features
The package implements four different environments as self-contained Petting Zoo environments. Each environment is highly customizable in terms of size, interactions structure and dynamics parameters. In addition, it comes with several useful utility tools to generate adjacency matrix and graph-structure to instantiate environments as well as some rendering and visualization tools.
Graph-native API: define any graph topology for your cooperative problem
Simple, extensible environments: start small, scale big
Baseline integrations: compatible with standard MARL algorithms
Realistic use-cases: inspired by domains like traffic, power systems, and logistics
Contributions
A Python-based library offering graph-structured multi-agent environments
The first standardized open-source implementations of theoretical graph-based MARL problems
A collection of benchmark results using independent and centralized learning algorithms
Quick Links
📦 GitHub repository: COGNAC
📊 Benchmark examples: cognac-benchmark-example
COGNAC is a Python-based benchmark suite offering flexible, graph-structured, cooperative multi-agent environments for MARL research. The package offers standardized minimal implementations of several well-known theoretical graph-based MARL problems taken from the literature such as the SysAdmin network [Guestrin et al.] or Firefighting Graph [Oliehoek et al.], adapted for empirical benchmarking with modern RL tooling.
List of Environments
Environment |
Modular Size |
Graph Agnostic |
Joint State Space |
Joint Act. Space |
|---|---|---|---|---|
Firefighting Graph(1D) |
✔️ |
❌ |
\(\theta^N\) |
\(2^N\) |
Firefighting Graph(2D) |
✔️ |
❌ |
\(\theta^{N \times M}\) |
\(4^N\) |
Binary Consensus |
✔️ |
✔️ |
\(2^N\) |
\(2^N\) |
SysAdmin |
✔️ |
✔️ |
\(9^N\) |
\(2^N\) |
Multi-commodity Flow |
✔️ |
❌ |
\(\rho_{\text{max}}^{k \times E}\) |
\(\rho_{\text{max}}^{k \times E}\) |
Technical Documentation
References
Ravindra K Ahuja, Thomas L Magnanti, James B Orlin, and others. Network flows: theory, algorithms, and applications. Volume 1. Prentice hall Englewood Cliffs, NJ, 1993.
Eugenio Bargiacchi, Timothy Verstraeten, and Diederik M Roijers. Cooperative prioritized sweeping. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), 160–168. 2021.
Federico Bianchi, Alberto Castellini, Alessandro Farinelli, Luca Marzari, Daniele Meli, Francesco Trotti, Celeste Veronese, and others. Developing safe and explainable autonomous agents: from simulation to the real world. In CEUR WORKSHOP PROCEEDINGS, 89–94. 2024.
Carlos Guestrin, Daphne Koller, and Ronald Parr. Max-norm projections for factored mdps. In IJCAI, volume 1, 673–682. 2001.
Carlos Guestrin, Michail Lagoudakis, and Ronald Parr. Coordinated reinforcement learning. In ICML, volume 2, 227–234. Citeseer, 2002.
Richard A Holley and Thomas M Liggett. Ergodic theorems for weakly interacting infinite systems and the voter model. The annals of probability, pages 643–663, 1975.
Jueming Hu, Zhe Xu, Weichang Wang, Guannan Qu, Yutian Pang, and Yongming Liu. Decentralized graph-based multi-agent reinforcement learning using reward machines. Neurocomputing, 564:126974, 2024.
Aditya Mahajan and Mehnaz Mannan. Decentralized stochastic control. Annals of Operations Research, 241(1):109–126, 2016.
Frans A Oliehoek, Christopher Amato, and others. A concise introduction to decentralized POMDPs. Volume 1. Springer, 2016.
Frans A Oliehoek, Matthijs TJ Spaan, Nikos Vlassis, and Shimon Whiteson. Exploiting locality of interaction in factored dec-pomdps. In Int. Joint Conf. on Autonomous Agents and Multi-Agent Systems. 2008.
James M Ooi and Gregory W Wornell. Decentralized control of a multiple access broadcast channel: performance bounds. In Proceedings of 35th IEEE Conference on Decision and Control, volume 1, 293–298. IEEE, 1996.
Feng Wu, S Zilberstein, and N.R. Jennings. Monte-carlo expectation maximization for decentralized pomdps. In Proceedings of the 23rd International Joint Conference on AI (IJCAI) (03/08/13 - 09/08/13), 397–403. 2013. URL: https://eprints.soton.ac.uk/351021/.