PFASGroups GUI

Desktop application for PFAS structure classification, prioritisation, and modelling


1 Classification Tab

Purpose

Load a molecular dataset and classify each molecule against the built-in PFAS group definitions. Matched groups are highlighted as colour-coded fragments in the structure view (Tab 2).

Data sources

Options

Results automatically propagate to all other tabs once classification completes.

2 Results Tab

Purpose

Browse the classified molecules as interactive compound cards. Each card shows:

Controls

3 Definition & SMARTS Tester Tab

Sub-tab A – PFAS Definitions

Enter a SMILES string and test it against:

Sub-tab B – Custom Group (SMARTS)

Define a custom HalogenGroup and test a molecule against it. Fields:

Diagnostics are printed as JSON in the right panel, showing which stage passed or failed (component detection, SMARTS matching, constraint checking).

4 Prioritise Tab

Purpose

Rank the classified molecules to identify the most structurally diverse or representative candidates.

Modes

Output

A sortable ranking table and a horizontal bar chart showing the top-30 priority scores. Higher scores indicate higher priority (more structural novelty or complexity relative to the reference).

5 Chemical Space Tab

Purpose

Visualise the PFASGroups fingerprint space as an interactive 2-D scatter plot.

Methods

Fingerprint presets

Select a FINGERPRINT_PRESETS key. best (binary + effective_graph_resistance) gives the best inter-group discrimination according to the MQG benchmark, but any preset can be used.

Colour by

By default, points are coloured by the dominant PFAS group. If a label column was loaded with the data, select it here to colour by an arbitrary property.

6 ML Modelling Tab

Purpose

Compare fingerprint descriptors for binary property prediction using HistGradientBoostingClassifier with repeated stratified k-fold CV.

Target column

Choose a column from your loaded data that contains binary labels (0/1 or True/False). Common examples: bioactive, toxic, active.

Fingerprint sets

Bayesian correlated t-test

All fingerprint-set pairs are compared using the Bayesian correlated t-test (Benavoli et al. 2017) with ROPE = 0.01 ROC-AUC. The table reports:

A P(ROPE) > 0.9 indicates the two fingerprint sets perform equivalently on this dataset.

Keyboard shortcuts

Installation

Install the GUI dependencies into your environment:

pip install "PFASGroups[gui]"

Or from the repository:

pip install -e ".[gui]"

Then launch with:

pfasgroups-gui

References