Metadata-Version: 2.4
Name: mlipts
Version: 0.1.0
Summary: Machine Learning Interatomic Potentials Training Suite
Author-email: William Davie <willdavie2002@gmail.com>
License: MIT
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: ase
Requires-Dist: py4vasp
Requires-Dist: numpy
Requires-Dist: scipy
Requires-Dist: matplotlib
Dynamic: license-file



# Machine Learned Interatomic Potentials - Training Suite (MLIPTS).
<img src="https://img.shields.io/badge/version-0.1.0-blue">

MLIPTS is a python package for training/fine-tuning machine learned interatomic potentials. 

The key idea is to perform the following active learning workflow [1] with as little user input as possible:

<p align="center">
  <img src="https://github.com/williamdavie/mlipts/blob/docs-edit/media/active_learning_flowchart.png" width="50%" height="auto">
</p>

> [!NOTE]
> The scope of a fully-fledged python package to perform seemless MLIP training is significant, with many availible MD and DFT/Quantum Chemistry codes and ways to quantify data quality.
> Contributors are welcome to help make this goal a reality.

## Version 0.1.0

### Installation

MLIPTS can be installed via pip

``` 
pip install mlipts
 ```

### Capability

```0.1.0``` is built to address the creation of an inital data set (Part 1 of the workflow above), however, note part of the current functionality is applicable across the main workflow, ultimately arriving at the following reduced workflow to address:

<p align="center">
  <img src="https://github.com/williamdavie/mlipts/blob/docs-edit/media/workflow_%231.png" width="50%" height="auto">
</p>

The MD code supported is ```LAMMPS``` and DFT code supported is ```VASP```, where the earth movers distance (EMD) has been implemented to filter configurations. The details of this method are found at [2] and [average-minimum-distance](https://github.com/dwiddo/average-minimum-distance) (Copyright (C) 2025 Daniel Widdowson).

### Usage

It is highly recommended to follow the availible example at (insert link to example). 

The working directory is set up in the following way:
```
collect_data/
├─ MD_base/
├─ QM_base/
└─ workflow.ipynb
```

Where ```MD_base``` and ```QM_base``` include the input files for molecular dynamics and quantum mechanical simulations respectively. Since the current version only supports lammps and vasp, these directories will have the following format:

```
├─ lammps_base/
    ├─ in.test
    └─ test.dat
├─ vasp_base/
    ├─ INCAR
    ├─ KPOINTS
    └─ POTCAR
```

Noting ```POSCAR``` is intentially missing as this is to be generated. 

> [!TIP]
> The key to a successful data collection workflow is ensuring all files in the above are formatted correctly, so it is recommend to test each _base_ directory. Collection of the full datase will be calling each calculation many times.  

With a directory set up, mlipts allows simply following of the flow chart above: 

1. Run many MD calculations:
```python
workflow.build_MD_calculations('./lammps_base',variables,outdir='./MD_calculations')
workflow.write_MD_submission_scripts(MD_cmd_line,submit=True)
```
2. Filter new configurations from MD:
```python
workflow.filter_active_MD(tol=0.1)
```
where ```tol``` defines a tolerence to keep or remove a configuration, i.e. if the earth movers distance (emd) between two configurations is less than ```tol``` one of the configurations is dropped. 

> [!NOTE]
> The emd is calculated between each pair of configurations and therefore _can be_ costly.

3. Run DFT calculations on new configurations
```python
workflow.write_QM_submission_scripts(QM_cmd_line,save_and_remove=True,submit=True)
```
where ```save_and_remove``` is an option to save the data from each QM calculation while running. Its default is True.

> [!NOTE]
> save and remove uses a python enviroment with mlipts installed _and_ utlizes py4vasp (requiring VASP version>6.2).

The final workflow will then appear as,
```
collect_data/
├─ MD_base/
├─ MD_calculations/
├─ MD_scripts/
├─ QM_base/
├─ QM_calculations/
├─ QM_scripts/
├─ workflow.ipynb
└─ training_data.xyz
```

and training_data_xyz can be passed into MACE [2] or reformatted for other MLIP architechtures. 

### References

[1] Jacobs, Ryan, et al. "A practical guide to machine learning interatomic potentials–Status and future." Current Opinion in Solid State and Materials Science 35 (2025): 101214.

[2] Widdowson, Daniel, and Vitaliy Kurlin. "Pointwise distance distributions for detecting near-duplicates in large materials databases." arXiv preprint arXiv:2108.04798 (2021).

[3] Batatia, Ilyes, et al. "MACE: Higher order equivariant message passing neural networks for fast and accurate force fields." Advances in neural information processing systems 35 (2022): 11423-11436.


### Contact

> William Davie, willdavie2002@gmail.com.
>
> Department of Material Science and Metallurgy, University of Cambridge.

