Metadata-Version: 2.4
Name: GGILES
Version: 1.1.0
Summary: The generalized graph input line entry system (GGILES) package produces sequential, string representations of graphs in a NetworkX format via depth-first graph traversal, and reproduces NetworkX graphs from the string representation. GGILES is flexible and customizable, accommodating a large range of graph types and formats.
Project-URL: Homepage, https://www.pi-research.org/
Project-URL: Source, https://github.com/process-intelligence-research/Generalized-graph-line-entry-system
Author-email: "Artur M. Schweidtmann" <a.schweidtmann@tudelft.nl>
License: MIT License
        
        Copyright (c) 2025 Artur M. Schweidtmann
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Keywords: Graphs, Sequence, Graph sequentialization, String representation, NetworkX
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.11
Requires-Dist: networkx>=3.4.2
Requires-Dist: numpy>=2.2.4
Provides-Extra: dev
Requires-Dist: coverage>=7.0; extra == 'dev'
Requires-Dist: pre-commit>=3.7; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.9; extra == 'dev'
Description-Content-Type: text/markdown

<!-- omit from toc -->
# Generalized Graph Input Line Entry System (GGILES) by&nbsp;[<img src="./docs/logos/Process_Intelligence_Black_Horizontal.png" alt="Process Intelligence Research logo" height="40">](https://www.pi-research.org/)
## Overview
The generalized graph input line entry system (GGILES) package produces sequential, string representations of graphs in a NetworkX format via depth-first graph traversal, and reproduces NetworkX graphs from the string representation. GGILES is flexible and customizable, accommodating a large range of graph representations.

![Visual Overview](./docs/Overview.gif)

## Features
GGILES features graph sequentialization and subsequent graph reconstruction for:
- Graphs with node types and (optionally) edge types.
- Directed and undirected graphs.
- Graphs with node and edge attributes as float or string 

GGILES also features:
- Default edge types: Default edges are not included in the string sequentialization for token efficiency and inferred in graph reconstruction.
- Attribute serialization with or without attribute name (keyword attribute).
- Optional or enforced attributes.
- Generating unique sequences: Graph invariant calculation for sequence canonicalization.
- A tokenizer for the sequence based on a regular expression.

All code examples below can also be found and run in [the example notebook](./docs/readme_examples.ipynb).

## Table of contents
- [Overview](#overview)
- [Features](#features)
- [Table of contents](#table-of-contents)
- [Installation](#installation)
- [Quickstart: Generating a string representation from a graph](#quickstart-generating-a-string-representation-from-a-graph)
- [Including the node and edge attributes in the sequence](#including-the-node-and-edge-attributes-in-the-sequence)
  - [Optional attributes](#optional-attributes)
- [Unique sequences for graphs](#unique-sequences-for-graphs)
- [Configuring node- and edge types](#configuring-node--and-edge-types)
  - [Changing the token type attribute](#changing-the-token-type-attribute)
  - [Optional type for edges](#optional-type-for-edges)
  - [Default type for edges](#default-type-for-edges)
- [Tokenization](#tokenization)
- [References](#references)
- [Copyright and license](#copyright-and-license)
- [Contact](#contact)

## Installation

Install GGILES from pip:
```bash
pip install ggiles
```

or from GitHub:
```bash
pip install git+https://github.com/process-intelligence-research/Generalized-graph-line-entry-system
```

Alternatively, get the latest updates by cloning the repo and installing the editable version of the package with:

```bash
git clone https://github.com/process-intelligence-research/Generalized-graph-line-entry-system
cd Generalized-graph-line-entry-system
pip install .
```

## Quickstart: Generating a string representation from a graph
Here is an example graph. 
```python
import networkx as nx
import matplotlib.pyplot as plt

G = nx.DiGraph()
G.add_node(0, type="a", id="1")
G.add_node(1, type="c", id="2")
G.add_node(2, type="b", id="3")
G.add_node(3, type="d", id="4")
G.add_node(4, type="b", id="5")
G.add_node(5, type="c", id="6")
G.add_node(6, type="a", id="7")
G.add_node(7, type="e", id="8")
G.add_node(8, type="a", id="9")
G.add_edge(0, 1, type="a-c", flt=0.1)
G.add_edge(1, 2, type="c-b", flt=1.2)
G.add_edge(2, 3, type="b-d", flt=2.3)
G.add_edge(3, 4, type="d-b", flt=3.4)
G.add_edge(4, 1, type="b-c", flt=4.1)
G.add_edge(3, 5, type="d-c", flt=3.5)
G.add_edge(6, 4, type="a-b", flt=4.6)
G.add_edge(4, 7, type="b-e", flt=5.7)

# Plot the graph
pos = nx.kamada_kawai_layout(G)  
edge_labels = nx.get_edge_attributes(G, "type")  
node_labels = nx.get_node_attributes(G, "type") 

plt.figure(figsize=(10, 8))
nx.draw(G, pos, with_labels=True, labels=node_labels, node_color='cyan', node_size=500, font_size=10, font_weight='bold')
nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_labels, font_color='green', font_weight='bold')

plt.title("Example graph")

plt.show()
```
We want to create a string representation of this graph, where the type attribute of nodes and edges is used as a token.
```python
from ggiles.graph_invariant import GraphInvariantProcessor, MorganAlgorithm, AlphabeticNodeOrdering
from ggiles import make_converter

invariant = GraphInvariantProcessor(layers=[MorganAlgorithm(), AlphabeticNodeOrdering("type")])
graph_converter = make_converter(invariant=invariant)
sequence = graph_converter.graph_to_sequence(G)
print(sequence)
```
Result
```
v(a)e(a-b)v(b)<1[e(b-e)v(e)]e(b-c)v(c)<&|v(a)e(a-c)&|e(c-b)v(b)e(b-d)v(d)e(d-b)1e(d-c)v(c)|nv(a)
```
Here, `v(.)` denotes a node (vertex), `e(.)` an edge, `[.]` a branch, `<&|.&.|` a converging branch, `1.<1` a cycle, and `|n` a new subgraph. 

Can we reconstruct the graph (with the information encoded in the sequence)?
```python
reconstructed_graph = graph_converter.sequence_to_graph(sequence)
print(nx.is_isomorphic(G, reconstructed_graph, 
        node_match=lambda x, y: x["type"] == y["type"], 
        edge_match=lambda x, y: x["type"] == y["type"])
     )
```

Using directed graphs works similarly by changing a flag in the converter constructor
```python
graph_converter = make_converter(invariant=invariant, is_directed=False)
```

## Including the node and edge attributes in the sequence
Attributes can be assigned to both nodes and edges. Attributes can be assinged without attribute key, or keyword-attributes with attribute key. To specify which attributes should be included in the sequence representation, add the attribute names to `node_attributes`, `node_kwarg_attributes`, `edge_attributes`, `edge_kwarg_attributes`. Any attribute in the graph can either be of type string (to represent categorical values) or of type float/int (to represent numerical values).

```python
from ggiles.attribute_encoding import AttributeConfig

invariant = GraphInvariantProcessor(layers=[MorganAlgorithm(), AlphabeticNodeOrdering("type")])
attr_config = AttributeConfig(
    node_attributes=["id"],
    edge_kwarg_attributes=["flt"]
    )

graph_converter = make_converter(invariant=invariant, attribute_config=attr_config)
sequence = graph_converter.graph_to_sequence(G)
print(sequence)
```
Result: `v(a){"7"}e(a-b){"flt":4.6}v(b){"5"}...`

Note how attributes are included after the respective token in `{.}`. If multiple attributes are included, they are separated by `,`.

### Optional attributes
If specified, attributes are required in nodes/edges. If attributes should be treated as optional, set the `treat_attributes_as_optional` in converter construction to True.
```python
from ggiles import AttributeConfig

invariant = GraphInvariantProcessor(layers=[MorganAlgorithm(), AlphabeticNodeOrdering("type")])
attr_config = AttributeConfig(
    node_attributes=["id"],
    )

graph_converter = make_converter(invariant=invariant, 
                                  attribute_config=attr_config, 
                                  treat_attributes_as_optional=True)

G_opt = G.copy()

del G_opt.nodes[6]["id"]
sequence = graph_converter.graph_to_sequence(G_opt)
print(sequence)
```
Result: `v(a)e(a-b)v(b){"5"}...`

Notice how the node "b" (corresponding to the graph node with id 6) has no attribute. Not enabling `treat_attributes_as_optional` in this case raises an exception.

Keyword attributes are decoded back into the graph automatically. Attributes without keywords need to be assigned a keyword before being reassigned to a new graph. By default, the attribute keywords are assigned in the order specified in `node_attributes` and `edge_attributes` in `AttributeConfig`.

## Unique sequences for graphs
GGILES can produce unique graph sequences. For this, a function for calculating a graph invariant needs to be specified. The graph invariant assigns unique ranks to the nodes of a graph, so that branching decisions can be made with respect to these ranks. The invariant is calculated by the `GraphInvariantProcessor` class. There are many ways to assign ranks to nodes, and some of them cannot always produce unique ranks to all graphs. An effective choice for the calculation of invariants can depend on the properties of the graphs in a specific use-case. Therefore, the `GraphInvariantProcessor` can have several `GraphInvariantLayers`. Each layer calculates node ranks. If a layer does not produce unique ranks and nodes with tied ranks remain, the ranks of lower level layers are used to break the ties.

Lower ranks are chosen first as start nodes of branches, new subgraphs, or converging branches. This convention promotes nodes with lower ranks to lie on the main sequence, and nodes with higher ranks to be pushed to branches and converging branches. 

The following layers are inbuilt
- MorganAlgorithm: Ranks nodes using the Morgan Algorithm.
- CategoricalAttributePriorization: Ranks nodes by attribute value priority.
- DescendantSizePrioritization: Ranks nodes by descendant subgraph size.
- AlphabeticNodeOrdering: Ranks nodes by alphabetical attribute order.
- CustomAttributePrioritization: Ranks nodes using a custom attribute function.
- DescendantRankSumOrdering: Ranks nodes by the sum of ranks in their descendant subgraphs.
- DisregardEqualDescendants: Assigns 'random' ranks to nodes with isomorphic descendant graphs.
- FastDisregardEqualDescendants: Faster version for branched, non-connected directed graphs.

In the example below, the ranks are calculated with the morgan algorithm. If tied nodes remain, the
ties are broken by ordering nodes alphabetically on the value of the "type" attribute.
```python
from ggiles.graph_invariant import GraphInvariantProcessor, MorganAlgorithm, AlphabeticNodeOrdering

invariant = GraphInvariantProcessor(layers=[MorganAlgorithm(), AlphabeticNodeOrdering("type")])
graph_converter = make_converter(invariant=invariant)
```
Graph invariants are complex and may be dependent on the specific structure of the graphs in question. Therefore, it makes sense to validate the graph invariant calculation against a set of example graphs to ensure that the ranking is consistent and unique across different graph structures. For this, use the processor's `validate_against_example_graphs` method:
```python
example_graphs: list[nx.Graph] = [] # Add example graphs here
print(invariant.validate_against_example_graphs(example_graphs))
```

## Configuring node- and edge types
### Changing the token type attribute
By default, GGILES uses the "type" attribte in nodes and edges for the selection of the strings for the tokens. This can be changed in a custom GraphTokenConfig:
```python
from ggiles import GraphTokenConfig

token_config = GraphTokenConfig(
    node_type_attribute="id"
)

graph_converter = make_converter(invariant=invariant, graph_token_config=token_config)
sequence = graph_converter.graph_to_sequence(G)
print(sequence)
```
Result: `v(7)e(a-b)v(5)...`

### Optional type for edges
For graphs without explicit edge types, the edge type token does not have to be included in the sequence representation. In this case, set the `edge_type_attribute` in `GraphTokenConfig` to `None`:
```python
from ggiles import GraphTokenConfig

token_config = GraphTokenConfig(
    edge_type_attribute=None
)

graph_converter = make_converter(invariant=invariant, graph_token_config=token_config)
sequence = graph_converter.graph_to_sequence(G)
print(sequence)
```
Result: `v(a)v(b)<1[v(e)]v(c)<&|v(a)&|v(b)v(d)1v(c)|nv(a)`

Notice how edge tokens are not included anymore.

### Default type for edges
It is also possible to omit the edge tokens of a single type. This is useful if a certain edge type occurs very frequently and can be treated as a default. This can be achieved by setting the `default_edge_type` in `GraphTokenConfig`:
```python
from ggiles import GraphTokenConfig

invariant = GraphInvariantProcessor(layers=[MorganAlgorithm(), AlphabeticNodeOrdering("type")])
token_config = GraphTokenConfig(
    default_edge_type="a-b"
)
graph_converter = make_converter(invariant=invariant, graph_token_config=token_config)
sequence = graph_converter.graph_to_sequence(G)
print(sequence)
```
Result: `v(a)v(b)<1[e(b-e)v(e)]e(b-c)v(c)...`

Now, all edges with type `a-b` are omitted and inferred when parsing back to a graph.

## Tokenization
The graph string representations can be tokenized with the GGILES package with the default tokenization scheme. The `GGILESTokenizer` produces a Regex expression that identifies node tokens, edge tokens, string attributes and digits, and syntax markers. Make sure to apply the same settings to the tokenizer as for the converter.
```python
from ggiles import GGILESTokenizer

invariant = GraphInvariantProcessor(layers=[MorganAlgorithm(), AlphabeticNodeOrdering("type")])
graph_converter = make_converter(invariant=invariant)
sequence = graph_converter.graph_to_sequence(G)


tokenizer = GGILESTokenizer()
tokens = tokenizer.tokenize(sequence)
print(tokens[0:2])
```
Result: `["v(a)", "e(a-b)"]`

## References
- Vogel, G., Hirtreiter, E., Schulze Balhorn, L., & Schweidtmann, A. M. (2023). SFILES 2.0: An extended text-based flowsheet representation. *Optimization and Engineering, 24*(4), 2911–2933. https://doi.org/10.1007/s11081-023-09798-9
- d’Anterroches, L. (2006). *Process flow sheet generation and design through a group contribution approach* [Dissertation]. Technical University of Denmark.
- Weininger, D. (1988). SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. *Journal of Chemical Information and Computer Sciences, 28*(1), 31–36. https://doi.org/10.1021/ci00057a005

## Copyright and license

GGILES is published under MIT license (see [license file](LICENSE))

Copyright (C) 2025 Artur Schweidtmann Delft University of Technology. 

## Contact

📧 [Contact](mailto:a.schweidtmann@tudelft.nl)

🌐 [PI research](https://pi-research.org)

<p align="left">
<a href="https://twitter.com/ASchweidtmann" target="blank"><img align="center" src="https://img.shields.io/badge/X-000000?style=for-the-badge&logo=x&logoColor=white" alt="fernandezbap" /></a>
</p>

<p align="left">
<a href="https://www.linkedin.com/in/schweidtmann/" target="blank"><img src="https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white" alt="https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white"  /></a>
</p>
