Metadata-Version: 2.4
Name: asesurfacefinder
Version: 1.1.0
Summary: Machine learned location of chemical adsorbates on high-symmetry surface sites.
Author-email: Joe Gilkes <joe@joegilk.es>
License: MIT License
        
        Copyright (c) 2024 Joe Gilkes
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
Project-URL: Homepage, https://github.com/joegilkes/ASESurfaceFinder
Keywords: ase,surface,atoms,molecule,chemistry,adsorbate,random forest,classification
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: MacOS
Classifier: Operating System :: Unix
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: scipy
Requires-Dist: scikit-learn
Requires-Dist: dscribe
Requires-Dist: ase
Requires-Dist: matplotlib
Dynamic: license-file

# ASESurfaceFinder

![PyPI - Version](https://img.shields.io/pypi/v/asesurfacefinder) ![Conda Version](https://img.shields.io/conda/v/conda-forge/asesurfacefinder)


A utility for determining surface facets and adsorption points of ASE-based systems consisting of molecules on surfaces.

[ASE](https://wiki.fysik.dtu.dk/ase/) comes with an excellent selection of utilities for working with atomic surfaces, enabling the construction of many common surface facets, the definition of high-symmetry points across these surfaces, and the adsorption of arbitrary molecules to these surface sites. However, computationally determining which of these sites an adsorbed molecule is bound to without prior knowledge is a non-trivial task, due to the symmetric equivalence of many sites and high similarity of many surface facets.

ASESurfaceFinder implements automated tools for training and validating random forest classification models (implemented in [scikit-learn](https://scikit-learn.org/stable/index.html)) that can identify surface sites based on the local atomic environment of adsorbed atoms. Given unseen adsorbed systems, it then enables these models to be used for prediction of both surface facet and high-symmetry adsorption site, to be used when cataloguing externally-generated adsorbed systems.

## Installation

ASESurfaceFinder has been developed for Python 3.9 and newer. It is available on PyPI and conda-forge, using their respective package managers:

```bash
# Pip (PyPI)
pip install asesurfacefinder

# Conda (conda-forge)
conda install -c conda-forge asesurfacefinder
```

## Usage

Given a workflow that produces XYZ geometries of molecules adsorbed on periodic solid surfaces in unknown locations, ASESurfaceFinder can be used to quickly categorise the adsorption sites by the surface's available high-symmetry points. For example, assuming the workflow can produce surface/adsorbate systems where the surface is one of the following (a 3x3x3 unit cell is shown):

| Example Surface | ASE Construction | Visualisation (3x3x3) |
|-----------------|------------------|-----------------------|
| Platinum FCC{100} | `fcc100('Pt', (3,3,3))` | ![3x3x3 unit cell of FCC{100} platinum](/examples/Pt_fcc100.svg) |
| Silver FCC{110} | `fcc110('Ag', (3,3,3))` | ![3x3x3 unit cell of FCC{110} silver](/examples/Ag_fcc110.svg) |
| Gold FCC{111} | `fcc111('Au', (3,3,3))` | ![3x3x3 unit cell of FCC{111} gold](/examples/Au_fcc111.svg) |

ASE defines the symmetrically equivalent adsorption sites for each of these surface facets (see [ASE's documentation on surface construction](https://wiki.fysik.dtu.dk/ase/ase/build/surface.html)). These are used as classification targets within ASESurfaceFinder.

To begin, the package's main class is initialised using these surfaces:

```python
from asesurfacefinder import SurfaceFinder, SampleBounds
from ase.build import fcc100, fcc110, fcc111

surfaces = [fcc100('Pt', (1,1,3)), fcc110('Ag', (1,1,3)), fcc111('Au', (1,1,3))]
surface_labels = ['Pt_fcc100', 'Ag_fcc110', 'Au_fcc111']
bounds = [
    {
        'ontop': SampleBounds(0.6, 1.5, 2.4),
        'bridge': SampleBounds(0.45, z_min=0.75, z_mid=1.4, z_max=2.0),
        'hollow': SampleBounds(0.5, z_min=0.4, z_mid=0.8, z_max=1.75)
    },
    {
        'ontop': SampleBounds(0.6, 1.5, 2.4),
        'hollow': SampleBounds(0.65, z_min=0.1, z_mid=0.65, z_max=1.3),
        'longbridge': SampleBounds(0.5, z_min=-0.6, z_mid=1.0, z_max=1.5),
        'shortbridge': SampleBounds(0.5, z_min=0.6, z_mid=1.75, z_max=2.1)
    },
    {
        'ontop': SampleBounds(0.6, 1.5, 2.4),
        'bridge': SampleBounds(0.35, z_min=0.7, z_mid=1.6, z_max=2.2),
        'fcc': SampleBounds(0.35, z_min=0.4, z_mid=1.4, z_max=2.0),
        'hcp': SampleBounds(0.35, z_min=0.4, z_mid=1.4, z_max=2.0)
    }
]

sf = SurfaceFinder(surfaces, labels=surface_labels, sample_bounds=bounds)
```

The `labels` argument is optional, and can be used for more precisely naming the surface facets; without it, they will be referred to by integers.

The `sample_bounds` argument is similarly optional, and allows for specification of the volume above each named surface site that should be sampled. Each `dict` corresponds to its respective surface, and each site can take its own bounds (see next section). In the event that some sites are not specified, this volume falls back to a default set of bounds, which can be controlled with the `sample_defaults` argument to the `SurfaceFinder` class (defaults to `SampleBounds(0.1, 1.0, 2.75)`).

> [!IMPORTANT]
> While surfaces outside of those that can be generated by ASE can be trained on, they must contain an `'adsorbate_info'` dictionary internally that specifies their minimal XY unit cell and their high-symmetry adsorption sites.

### Model Training

To train a random forest classification model to recognise the adsorption sites on each of these surfaces, the `train` method of this class can be called:

```python
sf.train(
    samples_per_site=10000,
    n_jobs=4
)
```

For each surface facet passed into `SurfaceFinder`, this samples positions above each adsorption site `samples_per_site` times for possible perturbations and adsorption heights. The volume that can be sampled is defined by a `SampleBounds` object, which is passed in on construction of the `SurfaceFinder` class:

```python
# SampleBounds can take 3 positional arguments: r_max, z_min and z_max.
# This defines a hemispheroidal sampling volume with radius 0.5 Ang, 
# a base plane 1.5 Ang above the highest surface atom and an apex 
# 2.4 Ang above the highest surface atom.
hemispheroid_bounds = SampleBounds(0.5, 1.5, 2.4)

# It can take these as keyword arguments too, with an optional 4th 
# argument: z_mid.
# This defines an ovoid sampling volume with maximum radius 0.3 Ang 
# at its middle plane, which sits 1.4 Ang above the highest surface
# atom. Top and bottom apexes of the volume lie at 2.3 Ang and 
# 1.0 Ang above the highest surface atom respectively.
ovoid_bounds = SampleBounds(r_max=0.3, z_min=1.0, z_mid=1.4, z_max=2.3)
```

> [!NOTE]
> See the next section for examples of how these sampling volumes look.

The local atomic environment around each sampled adsorbate position is then encoded by means of a descriptor from [DScribe](https://singroup.github.io/dscribe/latest/) - a SOAP descriptor with a cutoff of 10 Å is used by default, but this can be modified by passing a `descriptor` keyword argument when constructing the `SurfaceFinder`. Each descriptor is matched with a label representing the surface facet and adsorption site that it represents in the format `{label}_{site}`, where `{label}` is one of the labels passed into `SurfaceFinder` and `{site}` is the ASE high-symmetry site name. The `n_jobs` argument enables parallelism over the specified number of processes during descriptor creation and model training.

A random forest classifier is then trained on this data, such that any future points above a surface can be classified into a surface facet and adsorption site prediction. 

### Sample Visualisation

Possible realisations of the adsorption point sampling can be visualised with ASESurfaceFinder's `SamplePlotter`:

```python
from asesurfacefinder import SamplePlotter
surface = fcc111('Au', (2,2,3))
sp = SamplePlotter(surface, samples_per_site=2000, sample_bounds={
    'ontop': SampleBounds(0.6, 1.5, 2.4),
    'bridge': SampleBounds(0.35, z_min=0.7, z_mid=1.6, z_max=2.2),
    'fcc': SampleBounds(0.35, z_min=0.4, z_mid=1.4, z_max=2.0),
    'hcp': SampleBounds(0.35, z_min=0.4, z_mid=1.4, z_max=2.0)
})

fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(12,6), layout='constrained')
ax1.set_axis_off(); ax2.set_axis_off(); ax3.set_axis_off()
sp.plot(ax=ax1, rotation='0x,0y,0z')
sp.plot(ax=ax2, rotation='-75x,0y,0z')
sp.plot(ax=ax3, rotation='-90x,-155y,0z')
```

This acts as a wrapper around `ase.visualize.plot`'s `plot_atoms` function, sampling adsorption points within the given bounds and displaying them on the requested surface. `SamplePlotter.plot()` passes through keyword arguments to this function, allowing for properties such as the rotation of each plot to be modified:

![Adsorption points sampled above high-symmetry sites on a gold FCC{111} surface](/examples/sampled_points.svg)

Here we can see the difference between the types of sampling volume defined by `SampleBounds`: the red `ontop` site sits over a single atom and is defined by a hemispheroidal dome, while the other sites sit in between atoms and permit atoms to sit in cavities within the top of the surface, so they use an ovoid sampling volume that is widest at the most common absorption heights and tapers to a point above and below this.

When defining `SampleBounds` for unknown surface sites, `SamplePlotter` can be used to fine-tune the sampling volume. To choose sensible initial values for a site think about where the smallest and largest atoms you will be adsorbing might lie - in a simple hydrocarbon, hydrogen would be able to approach sites more closely than carbon due to its smaller atomic radius. Choosing the correct sampling bounds will vary depending on the chemical system and how the input geometries for prediction are being created and optimised, so this exercise is left up to the reader.

> [!IMPORTANT]
> Visualisation is an important step, as it allows you to ensure that sampling volumes are not overlapping! Overlapping volumes are likely to confuse the classifier during training, decreasing prediction accuracy. Watch out for overlap between symmetrically equivalent sites - `SamplePlotter` does not currently show these.

### Model Validation

The training can be validated before performing any predictions on real systems:

```python
sf.validate(samples_per_site=2000, surf_mults=[(1,1,1), (3,3,1), (5,5,1)], sample_bounds=[
    {'bridge': SampleBounds(0.5, z_min=0.65, z_mid=1.4, z_max=2.2)},
    {},
    {'fcc': SampleBounds(0.3, z_min=0.5, z_mid=1.4, z_max=1.9),
    'hcp': SampleBounds(0.3, z_min=0.5, z_mid=1.4, z_max=1.9)}
])
```

This performs the same adsorbate position sampling procedure as was used to generate local atomic environments in the training of the classifier, but different values of the sampling volume can be passed through the `sample_bounds` argument to test the limits of applicability of the current model. While a `dict` has to be provided for each surface (when using this argument), surface sites not given new validation bounds will default to using the bounds that were used in training. 

In this case, since we have expanded the sample volume of the `bridge` site on `Pt_fcc100` it is likely that there will be a few failed predictions, as the predictor will have to extrapolate to local environments outside the height and XY bounds that it was trained on. Conversely, we have reduced the volume for the `fcc` and `hcp` sites of `Au_fcc111`, so validation on these sites should be completely accurate. It can therefore be useful to set the training bounds to a slightly larger volume than you might expect, and decrease accordingly to realistic bounds when validating.

Additionally, the `surf_mults` argument can be used to scale up the unit cell of each surface facet before sampling to ensure that adsorbates on larger surface slabs are correctly predicted. In this example, since the provided facets were all 1x1x3 unit cells, this will produce sampled adsorbate positions on 1x1x3, 3x3x3 and 5x5x3 surface slabs.

ASESurfaceFinder will output a summary of these validation results once it has predicted labels for these newly generated positions and compared them with their true values. If the classifier has been trained correctly, this will simply be something like

```
66000/66000 sites classified correctly (accuracy = 1.0).
```

However, if this validation fails for any sites, ASESurfaceFinder prints them in a table where `h` is the sampled adsorbate height and `r` is the radial distance in Angstroms away from the ideal adsorption site:

```
63559/66000 sites classified correctly (accuracy = 0.9630151515151515).

True                                            | Predicted
-------------------------------------------------------------------
Au_fcc111_bridge (5, 5, 1) (h = 2.48, r = 0.44) | Au_fcc111_hcp
Au_fcc111_bridge (5, 5, 1) (h = 1.42, r = 0.67) | Au_fcc111_ontop
Au_fcc111_bridge (5, 5, 1) (h = 1.60, r = 0.42) | Au_fcc111_hcp
Au_fcc111_fcc (5, 5, 1) (h = 2.43, r = 0.52)    | Au_fcc111_bridge
Au_fcc111_fcc (5, 5, 1) (h = 2.40, r = 0.63)    | Au_fcc111_bridge
Au_fcc111_fcc (5, 5, 1) (h = 1.98, r = 0.54)    | Au_fcc111_bridge
...
```

This indicates that either
1. the training samples were not varied enough to account for the sample space covered by the validation samples,
2. the validation samples covered too wide an area and overlapped between neighbouring sites, causing samples to be mislabelled,
3. the training samples accounted for too wide a sample space, causing overlap between neighbouring sites and misclassification.

In a production use case, each site's `SampleBounds` should be tweaked such that validation returns no errors while sampling as broad and realistic a volume above each adsorption site as possible.

### Model Prediction

To predict the high-symmetry adsorption sites and surface facets of real surface/adsorbate systems, these can simply be fed into a trained `SurfaceFinder`.

Taking an example system of 3 methanol molecules adsorbed to 'fcc' sites of gold FCC{111}, with a free methanol molecule above the surface:

![Four methanol molecules on gold FCC{111}](/examples/4MeO_Au_fcc111.svg)

The adsorbates and gas phase molecule can be separated from the surface and classified by their adsorption sites (or lack thereof) with the `SurfaceFinder.predict()` method:

```python
from ase.io import read

real_surface = read('examples/4MeO_Au_fcc111.xyz')
slab, molecules, labels = sf.predict(real_surface)
```

This returns three outputs:
1. `slab`: The clean surface geometry as an `ase.Atoms`, without any adsorbates.
2. `molecules`: A list of `ase.Atoms` representing the adsorbates and gas phase molecules found on and above the surface respectively.
3. `labels`: A list of dictionaries corresponding to the entries in `molecules` containing the atoms of the respective adsorbate that are adsorbed onto the surface, their predicted sites and the adsorption heights above those sites.

For example, given the above system:

```python
for molecule, label in zip(molecules, labels):
    print(molecule, label)
```

```
Atoms(symbols='OCH3', pbc=False, tags=...) {0: {'site': 'Au_fcc111_fcc', 'bonded_elem': 'O', 'coordination': 3, 'height': 1.7941429100000015}}
Atoms(symbols='OCH3', pbc=False, tags=...) {0: {'site': 'Au_fcc111_fcc', 'bonded_elem': 'O', 'coordination': 3, 'height': 1.79542644}}
Atoms(symbols='OCH3', pbc=False, tags=...) {0: {'site': 'Au_fcc111_fcc', 'bonded_elem': 'O', 'coordination': 3, 'height': 1.7944157000000018}}
Atoms(symbols='CHOH3', pbc=False, tags=...) {}
```

The first three entries are the adsorbed methanol molecules, correctly classified as being bound to the 'fcc' high-symmetry site on a gold FCC{111} surface. Each molecule's `label` dictionary is keyed by the index of the atom in the corresponding `molecule` which is bound; in the event that a molecule is bound to the surface by multiple atoms, it will contain en entry for each. In this case, all adsorbates are bound by their respective 0th atoms - their oxygens, indicated by the `'bonded_elem'` key.

Additionally, each `label` also provides the coordination of the adsorption site as detected by ASE's connectivity tools. Since the 'fcc' site sits in between three atoms, this is reflected in the `'coordination'` key.

The fourth molecule returned is the gas phase methanol, which is not adsorbed onto the surface. This is reflected in its `label` dictionary, which is empty to indicate no atoms are bound.

#### Adjusting connectivity cutoffs

`SurfaceFinder` identifies atoms to run for prediction by searching for connectivity to the surface with ASE's [`NeighborList`](https://wiki.fysik.dtu.dk/ase/ase/neighborlist.html#) tools. When using more approximate semiempirical or forcefield methods to construct adsorbed systems, atoms can be less tightly bound to surfaces and partially or entirely fail these connectivity checks, leading to adsorbed atoms being tagged incorrectly.

ASE therefore allows for passing different cutoff distances to these connectivity checks, such that the 'collision volume' of each atom can be expanded. This is usually done through the [`natural_cutoffs`](https://wiki.fysik.dtu.dk/ase/ase/neighborlist.html#ase.neighborlist.natural_cutoffs) function, which accepts a scaling factor `mult`. We allow a set of cutoffs (which can even be manually defined) to be passed through during site prediction.

The following represents a more complex example with different adsorbates bound on a range of sites, optimised with a semiempirical method that can underbind adsorbates:

![A selection of molecules on a range of gold FCC{111} adsorption sites](/examples/CO+H2O+NHCH3_Au_fcc111.svg)

Running the same prediction workflow with slightly expanded `NeighborList` cutoffs reveals carbon monoxide on an 'ontop' site, hydroxide on a 'fcc' site, and methylamine on a 'bridge' site:

```python
from ase.neighborlist import natural_cutoffs

underbound_surface = read('examples/CO+H2O+NHCH3_Au_fcc111.xyz')
cutoffs = natural_cutoffs(underbound_surface, mult=1.15)
slab, molecules, labels = sf.predict(underbound_surface, nl_cutoffs=cutoffs)

for molecule, label in zip(molecules, labels):
    print(molecule, label)
```

```
Atoms(symbols='CO', pbc=False) {0: {'site': 'Au_fcc111_ontop', 'bonded_elem': 'C', 'coordination': 1, 'height': 2.4249690500000014}}
Atoms(symbols='OH', pbc=False) {0: {'site': 'Au_fcc111_fcc', 'bonded_elem': 'O', 'coordination': 3, 'height': 1.7710107300000004}}
Atoms(symbols='NHCH3', pbc=False) {0: {'site': 'Au_fcc111_bridge', 'bonded_elem': 'N', 'coordination': 2, 'height': 1.9164665700000008}}
```

#### Rejecting near-surface atoms by coordination

In some cases, adsorbates can contain atoms which lie close enough to the surface to be picked up by ASE's connectivity detection, but should not be considered to be adsorbed. This usually occurs to hydrogen atoms that are directly bonded to adsorbed atoms. In such cases, it can be beneficial to classify these atoms as not adsorbed and instead rely on a single point of adsorption. Take this ethylamino anion on Pt FCC{100}, where a hydrogen atom sits above a `bridge` site:

![Ethylamino anion on Pt FCC{100}, with low-lying hydrogen atom potentially interacting with the surface.](/examples/CH3CH2NH_Pt_fcc100_sideview.svg)

Identifying such atoms can be tricky, although when sent for site prediction they usually come back undercoordinated. Depending on how `SurfaceFinder is trained`, this molecule can return the following labels:

```python
{
    2: {'site': 'Pt_fcc100_ontop', 'bonded_elem': 'N', 'coordination': 1, 'height': 1.9993630699999994}, 
    8: {'site': 'Pt_fcc100_bridge', 'bonded_elem': 'H', 'coordination': 1, 'height': 2.0844453000000005}
}
```

As FCC{100} `bridge` sites are supposed to be 2-coordinated, we know this atom is undercoordinated. However, this is not necessarily grounds for rejection - occasionally, multiple atoms can share the same surface site and have the wrong coordination, and rejecting all of these would lead to predictions of unadsorbed molecules. Rejection of an adsorbed atom therefore only occurs if it is directly bonded to an adsorbed atom with the correct coordination, such as this hydrogen bonded to the 1-coordinate `ontop` site here.

This behaviour has to be opted into if desired using the `reject_wrong_coordination` argument of `SurfaceFinder.predict()`:

```python
undercoordinated_h_surface = read('examples/CH3CH2NH_Pt_fcc100.xyz')
slab, molecules, labels = sf.predict(undercoordinated_h_surface, reject_wrong_coordination=True)

for molecule, label in zip(molecules, labels):
    print(molecule, label)
```

```
Atoms(symbols='C2NH6', pbc=False, tags=...) {2: {'site': 'Pt_fcc100_ontop', 'bonded_elem': 'N', 'coordination': 1, 'height': 1.9993630699999994}}
```

Sometimes ASESurfaceFinder will not find the correct coordination for some sites, as it calculates this by placing a hydrogen atom at the bottom of the site's sampling volume and checking its connectivity to the surface. In particularly wide hollows such as the `hollow` site in FCC{110}, the small scale of the hydrogen atom can lead to predicted undercoordination. This can be identified by the printout at `SurfaceFinder` initialisation, and corrected by passing a value to `SurfaceFinder`'s `site_coordinations` keyword argument:

```python
sf = SurfaceFinder([fcc110('Ag', (1,1,3))], labels=['Ag_fcc110'], site_coordinations=[{'hollow': 4}])
```

#### Rejecting near-surface hydrogens

Depending on how `SurfaceFinder` is trained, the above hydrogen may instead be predicted to be adsorbed to the same `ontop` site that the nitrogen is on! Rejecting the hydrogen through undercoordination to its surface site will fail, since it is on a 1-coordinate site:

```python
{
    2: {'bonded_elem': 'N', 'coordination': 1, 'height': 1.9993630699999994, 'site': 'Pt_fcc100_ontop'}, 
    8: {'bonded_elem': 'H', 'coordination': 1, 'height': 2.0844453000000005, 'site': 'Pt_fcc100_ontop'}
}
```

Instead, we can take the more aggressive option of rejecting all hydrogen atoms that are detected as adsorbed, provided they are directly bonded to a non-hydrogen adsorbed atom. This behaviour can be enabled with the `reject_bonded_hydrogens` argument of `SurfaceFinder.predict()` (defaults to `False`). The distinction of rejected hydrogens being bonded to *non-hydrogen* atoms is important, as otherwise hydrogen molecules on the surface could be rejected entirely if they are in a horizontal conformation.

## A note on adsorbate identification

ASESurfaceFinder does not currently contain a nice, clean way of separating *all* adsorbates from surfaces. In the first instance, it searches for a set of per-atom 'tags' in the underlying `ase.Atoms` object (accessed by `atoms.get_tags()`). These are generated by ASE when using its adsorbate-adding functionality, and represent the layer of the surface that each atom is part of - adsorbates are tagged at layer 0, with increasing layer tags indicating increasing depth.

However, it is unlikely that input surface/adsorbate systems will be tagged like this since if they were generated with ASE, you would already know the sites on which the adsorbates sit. Instead, ASESurfaceFinder currently relies on separation by element. Since it knows which elements the surfaces are made of, it will simply mask these elements off to separate the adsorbates from the surface. A warning will be displayed when this is performed.

Since the majority of surface/adsorbate systems are organic molecules on metal surfaces, this approach will work most of the time. However, when dealing with surfaces that contain the same elements as adsorbates (e.g. diamond, silica, etc.), this approach will fail. A more concrete approach for adsorbate separation is being developed, but will likely require analysis of the training surfaces as a whole.
