Metadata-Version: 2.4
Name: etnn
Version: 0.1.1
Summary: E(n)-Equivariant Topological Neural Networks
Author: ETNN Authors
License: MIT License
        
        Copyright (c) 2023 NSAPH Projects
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/NSAPH-Projects/topological-equivariant-networks
Project-URL: Repository, https://github.com/NSAPH-Projects/topological-equivariant-networks
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=2.0
Requires-Dist: torch_geometric>=2.5
Requires-Dist: numpy>=1.22
Requires-Dist: scipy>=1.8
Requires-Dist: numba>=0.55
Requires-Dist: toponetx>=0.2.0
Provides-Extra: geo
Requires-Dist: pandas>=1.4; extra == "geo"
Requires-Dist: networkx>=3.1; extra == "geo"
Requires-Dist: matplotlib>=3.7; extra == "geo"
Dynamic: license-file

# E(n) Equivariant Topological Neural Network

Code for *Claudio Battiloro, Ege Karaismailoğlu, Mauricio Tec, George Dasoulas, Michelle Audirac, and Francesca Dominici* "E(n) Equivariant Topological Neural Networks"
https://arxiv.org/abs/2405.15429

<img src="assets/etnn.png" width="1000">

**Abstract** Graph neural networks excel at modeling pairwise interactions, but they cannot flexibly accommodate higher-order interactions and features. Topological deep learning (TDL) has emerged recently as a promising tool for addressing this issue. TDL enables the principled modeling of arbitrary multi-way, hierarchical higher-order interactions by operating on combinatorial topological spaces, such as simplicial or cell complexes, instead of graphs. However, little is known about how to leverage geometric features such as positions and velocities for TDL. This paper introduces E(n)-Equivariant Topological Neural Networks (ETNNs), which are E(n)-equivariant message-passing networks operating on combinatorial complexes, formal objects unifying graphs, hypergraphs, simplicial, path, and cell complexes. ETNNs incorporate geometric node features while respecting rotation and translation equivariance. Moreover, ETNNs are natively ready for settings with heterogeneous interactions. We provide a theoretical analysis to show the improved expressiveness of ETNNs over architectures for geometric graphs. We also show how several E(n) equivariant variants of TDL models can be directly derived from our framework. The broad applicability of ETNNs is demonstrated through two tasks of vastly different nature: i) molecular property prediction on the QM9 benchmark and ii) land-use regression for hyper-local estimation of air pollution with multi-resolution irregular geospatial data. The experiment results indicate that ETNNs are an effective tool for learning from diverse types of richly structured data, highlighting the benefits of principled geometric inductive bias.

## Setup instructions
```
git clone <repo>
cd topological-equivariant-networks
conda env create -f environment.yaml
```

This will create and activate a new conda environment called "etnn" that contains all the packages necessary to run the code in this repository.

## Install as a library

With Python and the matching `torch`/`torch_geometric` wheels preinstalled, you can install the ETNN package directly:

```bash
pip install .
# or
pip install /path/to/topological-equivariant-networks
```

This exposes the `etnn` module (e.g., `from etnn import ETNN`) for downstream projects.

## Real-World Combinatorial Complexes

To showcase the modeling power of CCs and their versatility in conjunction with ETNNs, we introduced two Real-World Combinatorial Complexes useful to model molecules and irregular multi-resolution geospatial data:

**Molecular CC (QM9)** In a Molecular CC, the set of nodes is the set of atoms, while the set of cells and the rank function are such that: atoms are rank 0 cells, bonds and functional groups made of two atoms are rank 1 cells, and rings and functional groups made of more than two atoms are rank 2 cells. A Molecular CC can jointly retain all the properties of graphs, cell complexes, and hypergraphs, thus overcoming their mutual limitations. 

**Geopatial CC** In a Spatial CC, the set of cells is defined by hierarchical spatial objects. Two major examples are geometric descriptions (e.g. points, polylines, and polygons) or political partitions (e.g. zipcodes, counties, states) of a region. In the latter case, the listed objects can be modeled as cells of a CC and their rank correspond to their intrinsic dimensionality. Therefore, rank 0 cells are points, e.g. where air quality monitoring units are located, rank 1 cells are polylines, e.g. traffic roads, and rank 2 cells are polygons, e.g. census tracts. The creation of the dataset and reproducibility instructions are documented in [this notebook](https://github.com/NSAPH-Projects/topological-equivariant-networks/blob/main/notebooks/geospatial-cc.ipynb) in this repository. The resulting dataset is also part of the repository itself.

<img src="assets/benchmarks.png" width="800">


## Configuration system

The code uses the [Hydra](https://hydra.cc/) configuration system.  Each CC (Molecular and Geopatial) has its own configuration directory. The configs are in the `conf/<cc_name>/` directory. Both config directories have a similar structure (and will be merged in the future). 

## Training

To train a model, you can use the following command:

```bash
python main_qm9.py experiment=<...>
```
for the Molecular CC task and
```bash
python main_geospatial.py experiment=<...>
```
for the Geospatial CC task

The experiments are in the `conf/<cc_name>/<experiment>`. Each experiment consists of a configuration for the dataset, model, and learning parameters. All parameters of the config can be overridden from the command line. For example, to train a model with a different learning rate. See the Hydra documentation for more details.


## Pre-save datasets for faster loading during training

The scripts [create_qm9.py](create_qm9.py) and [create_geospatial.py](create_geospatial.py) use the `InMemoryDataset` [pytorch-geometric](https://pytorch-geometric.readthedocs.io/en/latest/modules/data.html#torch_geometric.data.InMemoryDataset) class to create and save the combinatorial complex as a `.pt` file. To save a dataset, use the commands.

```bash
python create_qm9.py dataset=<dataset_name>
python create_geospatial.py dataset=<dataset_name>
```
where the dataset must be one of the datasets listed in the `conf/<cc_name>/datasets/` directory. This will save the dataset in the `data/<cc_name>_<dataset_name>` directory. This dataset is also cached when training a model. However, pre-caching is useful when running multiple experiments on the same dataset. In particular, lifting the QM9 dataset to a Molecular CC can take about 30 minutes. 

