Metadata-Version: 2.1
Name: master_strange_mol_rep
Version: 0.0.1
Summary: Stiv Llenga (Master Strange) created this simple package to assist master students and other researchers who are working on generating molecular representations and manipulating arrays in various forms.
Author-email: Stiv Llenga <stiv.llenga@h-its.org>
License: MIT License
        
        Copyright (c) [2022] [Stiv Llenga]
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.7
Description-Content-Type: text/markdown

# master_strange_mol_rep

Stiv Llenga (a.k.a. Master Strange) created this simple package to assist master students and other researchers who are working on generating molecular representations and manipulating arrays in various forms. For more information, please contact the author via email (stiv.llenga@h-its.org)

The following information describes the purpose and how to use the functions included with msc_strange_mol_rep:

## a) Generating SOAP representation :

   Please read this paper if you need to refresh your memory on what SOAP representation is : (https://doi.org/10.48550/arXiv.1209.3140).
   
   Command:
   
####   ARRAY = create_soap(path,rcut_i=8,processors=1)
   
   INPUT:
   
   * --path       -> (Str) The full path to your xyz files;
   
   * --rcut_i     -> (Float) Cutoff radius in Angstrom (Å);
   
   * --processors -> (Int) The total number of processors.
   
   OUTPUT:
   
   * ARRAY -> The SOAP ndarray. The length of the longest molecular SOAP array is used to pad all of the molecular SOAP. 
  
   Comments:
   
   This command generates SOAP representations for all molecules in the path-specified directory.
   The xyz format is required for molecular structures. 

## b) Generating Coulomb Matrix representation :

   Please read this paper if you need to refresh your memory on what CM representation is : (https://doi:10.1103/PhysRevLett.108.058301).

   Command:
   
####   ARRAY = create_cm(path, processors=1)
   
   INPUT:
   
   * --path       -> (Str) The full path to your xyz files;
   
   * --processors -> (Int) The total number of processors.
   
   OUTPUT:
   
   * ARRAY -> The CM ndarray. For all molecules, the coulomb matrix is flattened and the length of the longest molecular CM array is used to pad all of the molecular CM.
   
   Comments:
   
   This command generates CM representations for all molecules in the path-specified directory.
   The xyz format is required for molecular structures.


## c) Generating the Spectrum of London and Axillrod-Teller-Muto potential (SLATM) representation :

   Please read this paper if you need to refresh your memory on what SLATM representation is : (https://arxiv.org/pdf/1807.04259.pdf).

   Command:
   
####   ARRAY = create_slatm(path)
   
   INPUT:
   
   * --path -> (Str) The full path to your xyz files
   
   OUTPUT:
   
   * ARRAY -> The SLATM ndarray. The length of the longest molecular SLATM array is used to pad all of the molecular arrays.
   
   Comments:
   
   This command generates SLATM representations for all molecules in the path-specified directory.
   The xyz format is required for molecular structures.
   
## d) Generating the MIBOC representation (Only for CCC) :

   Command:
   
####   ARRAY = create_MIBOC(path,basis_set='def2-TZVP', charge=0, spin=0)
   
   INPUT:
   
   * --path      -> (Str) The full path to your xyz files;
   
   * --basis_set -> (Str) The basis set you want to use;
   
   * --charge    -> (Int) The system's charge;
   
   * --spin      -> (Int) The system's spin (Nr_spin_alpha - Nr_spin_beta).
   
   OUTPUT:
   
   * ARRAY -> The MIBOC ndarray. The length of the longest molecular MIBOC array is used to pad all of the molecular arrays.
   
   Comments:
   
   This command generates MIBOC representations for all molecules in the path-specified directory.
   The xyz format is required for molecular structures. Please use the number of electrons with spin up minus electrons with spin down when calculating the spin (not 2S+1).

## e) Generating the molecular QR representation (Only for CCC) :

   Command:
   
####   create_qr(array,path)
   
   INPUT:
   
   * --array -> (Str) An array of molecular representations;
   
   * --path  -> (Str) The location on your computer where the QR images will be saved. 
   
   OUTPUT:
   
   All QR images are saved locally. 
   
   Comments:
   
   This command generates QR representations for all molecules arrays (any of them) and saves the results in the path specified. 
   
## f) Dimension reduction by Principal Component Analysis:

   Command:
   
####   pca = dim_red(data,nr_dim=2)
   
   INPUT:
   
   * --data   -> (Str) The ndarray array whose dimensions should be reduced (for example, N*m array); 
   * --nr_dim -> (Int) The total number of final dimensions (2 or 3).
   
   OUTPUT:
   
   * --pca -> The new low-dimensional arrays (for example, N*2 or N*3 array)
   
   Comments:
   
   This command can be used to reduce the dimensionality of an array.

## g) Making arrays of same size:

   i) Command:
  
#### new_array = zero_pad_inner(array)
      
INPUT:
      
   * --array -> (ndarray or list) The ndarray whose components are not all of the same size.
      
OUTPUT:
      
   * --new_array -> The ndarray whose components are all of the same size.
      
Comments:
   
   Adjusts the sizes of all of an ndarray's elements to fix the dimensions of the array. 
   
  ii) Command:
   
#### new_array = zero_pad_two_ndarrays(big_array,small_array)
   
INPUT:
      
   * --big_array   -> The N*m-dimensional ndarray;
   
   * --small_array -> The ndarray with dimensions N*b, where b < m.
   
OUTPUT:
      
   * --new_array -> The modified small array with N*m dimensions. 
   
Comments:
   
   When two arrays must be the same size for various reasons, this command changes the smaller array's size to match that of the larger array. 

## h) Cosine similarity analysis:

   Command:
   
#### similarity = cosine_similarity(array_one,array_two)
   
   INPUT:
   
   * --array_one -> (array) The first ndarray;
   * --array_two -> (array) The second ndarray.
   
   OUTPUT:
   
   * --similarity -> The similarity of array_one and array_two.
   
   Comments:
   
   Returns an ndarray containing the similarities of each array. 


## i) Set up the necessary dependencies:

   Command:
   
#### install_dependencies()


## j) Add in the necessary libraries:

   Command:
   
#### define_modules()


HOW TO USE IT:

#### a) Install the package (pip install msc-strange-mol-rep==0.0.1)

#### b) Import mol_rep (from msc_strange_mol_rep import mol_rep)

#### c) Use the functions inside mol_rep (mol_rep.#)
