glasspy.chemistry package
Submodules
glasspy.chemistry.convert module
Conversion and rescaling of objects containing chemical information.
This module offers some helper functions to convert and rescale objects that hold chemical information, ChemArrays.
See the function to_array for an easy way to convert strings, dictionaries, and pandas DataFrames to ChemArray.
Check the function wt_to_mol and mol_to_wt to easily convert a ChemArray from wt% to mol% and vice versa.
- glasspy.chemistry.convert.mol_to_wt(x: ndarray | ChemArray, input_cols: List[str], rescale_to_sum: float | int | bool = False) ndarray | ChemArray
Convert an array from mol% to weight%.
- Parameters:
x – A 2D array. Each row is a chemical substance. See the docstring of the function to_array for more information.
input_cols – List of strings representing the chemical entities related to each column of x.
rescale_to_sum – A positive number representing the total sum of each chemical substance. If False then the same input array is returned.
- Returns:
A 2D array. Each row is a chemical substance.
- glasspy.chemistry.convert.rescale_array(x: ndarray | ChemArray, rescale_to_sum: float | int | bool = 100) ndarray | ChemArray
Rescale all rows of an array to have the same sum.
This function does nothing if rescale_to_sum is False.
- Parameters:
x – A 2D array. Each row is a chemical substance. See the docstring of the function to_array for more information.
rescale_to_sum – A positive number representing the total sum of each chemical substance. If False then the same input array is returned.
- Returns:
A 2D array. Each row is a chemical substance.
- Raises:
AssertionError – Raised when rescale_to_sum is negative.
- glasspy.chemistry.convert.to_array(x: str | List[float] | List[List[float]] | ndarray | Dict[str, float] | Dict[str, List[float]] | Dict[str, ndarray] | DataFrame | ChemArray, input_cols: List[str] = [], output_cols: str | List[str] = 'default', rescale_to_sum: float | int | bool = False) ChemArray
Convert the input object to an array.
Most of the operations in this module receive an array as the argument. This design choice was made because numpy arrays are fast. These arrays may be called “chemical arrays”, but they are still instances of numpy array. Chemical arrays must follow three rules: i. each row of the array is one chemical substance; ii. each column of the array represents a chemical element or chemical
molecule;
chemical arrays must be 2D arrays.
Say, for example, that you have a chemical array x. The value stored in x[i][j] is the amount of the element/molecule j that the substance i has. It is up to the user to define what this amount means. Perhaps it is the mole fraction of the element/molecule j; perhaps it is the weight percentage of the element/molecule j. It doesn’t matter what it means, as long as this definition is the same for all values stored in this chemical array x. Check the function wt_to_mol and mol_to_wt to easily convert array from wt% to mol% and vice versa. Note: even if your array hold only one substance, it must still meet condition iii, that is: it must still be a 2D array (in this case with only one row).
- Parameters:
x – Any composition like object.
input_cols – List of strings representing the chemical entities related to each column of x. Necessary only when x is a list or array, ignored otherwise.
output_cols – List of strings of chemical compounds or chemical elements. The columns of the output array will be arranged in this order. If ‘default’ is passed to this argument, then the output array will have the same order as the input. Caution: this function does not convert compounds to chemical elements, see to_element_array if this is what you are looking for.
rescale_to_sum – A positive number representing the total sum of each chemical substance. If False then the same input array is returned.
- Returns:
The converted ChemArray.
- Raises:
ValueError – Raised when x is a list or an array that is not 1D or 2D.
- glasspy.chemistry.convert.to_element_array(x: str | List[float] | List[List[float]] | ndarray | Dict[str, float] | Dict[str, List[float]] | Dict[str, ndarray] | DataFrame | ChemArray, input_cols: List[str] = [], output_element_cols: str | List[str] = 'default', rescale_to_sum: float | int | bool = False) ChemArray
Convert x to an element array.
- Parameters:
x – Any composition like object.
input_cols – List of strings representing the chemical entities related to each column of x. Necessary only when x is a list or array, ignored otherwise.
output_element_cols – List of strings of chemical element symbols. The columns of the output array will be arranged in this order. If ‘all’ is passed to this argument, then the output array will be a sequence from hydrogen to plutonium sorted by chemical number. If ‘default’ is passed, then the output array will have only the chemical elements that are present in x, sorted alphabetically.
rescale_to_sum – A positive number representing the total sum of each chemical substance. If False then the same input array is returned.
- Returns:
The converted ChemArray.
- Raises:
AssertionError – Raised when input_cols has lenght of zero and x is a list or an array
AssertionError – Raised when the lenght of input_cols is different than the lenght of x along the column axis.
- glasspy.chemistry.convert.wt_to_mol(x: ndarray | ChemArray, input_cols: List[str], rescale_to_sum: float | int | bool = False) ndarray | ChemArray
Convert an array from weight% to mol%.
- Parameters:
x – A 2D array. Each row is a chemical substance. See the docstring of the function to_array for more information.
input_cols – List of strings representing the chemical entities related to each column of x.
rescale_to_sum – A positive number representing the total sum of each chemical substance. If False then the same input array is returned.
- Returns:
A 2D array. Each row is a chemical substance.
glasspy.chemistry.data module
glasspy.chemistry.featurizer module
- glasspy.chemistry.featurizer.physchem_featurizer(x: str | List[float] | List[List[float]] | ndarray | Dict[str, float] | Dict[str, List[float]] | Dict[str, ndarray] | DataFrame | ChemArray, input_cols: List[str] = [], elemental_features: List[str] = [], weighted_features: List[Tuple[str, str]] = [], absolute_features: List[Tuple[str, str]] = [], rescale_to_sum: float | int | bool = 1, sep: str = '|', check_invalid: bool = True, order: str = 'ewa') Tuple[ndarray, List[str]]
Extract features from a chemical object.
For a list of all possible features that can be extracted, check the variable all_features.
- Parameters:
x – Any composition like object.
input_cols – List of strings representing the chemical entities related to each column of x. Necessary only when x is a list or array, ignored otherwise.
absolute_features – List of chemical elements that will be part of the features.
absolute_features – List of tuples containing the name of the feature to be extracted and the aggregator function. Features computed from this list are absolute.
weighted_features – List of tuples containing the name of the feature to be extracted and the aggregator function. Features computed from this list are weighted.
rescale_to_sum – A positive number representing the total sum of each chemical substance. If False then the same input array is returned.
sep – String used to separate the information of the name of each extracted feature.
check_invalid – Checks if there are invalid features that cannot be computed. Invalid features are those with missing values in the desired chemical domain. The function still works even with invalid features. However, it is not recommended to use it in this case. Only disable this check if you are sure that no invalid features exist in your chemical domain.
order – String containing the order of the features in the final array. Use the letter e for elemental features, w for weighted features, and a for absolute features.
- Returns:
A 2D array. Each row is a chemical substance. feature_columns;
A list of strings containing the name of the extracted chemical feature. Strings starting with “A” are absolute features and strings starting with “W” are weighted.
- Return type:
features
- Raises:
AssertionError – Raised when rescale_to_sum is negative.
ValueError – Raised when the input composition has chemical elements that cannot be used to extract features.
ValueError – Raised when invalid features are present and check_invalid is True.
glasspy.chemistry.types module
Provides CompositionLike to check if an object is a valid chemical object and the ChemArray class.
There are many ways to represent a chemical substance using Python objects. GlassPy accepts 8 different types:
String: Any string that can be parsed by the parse_formula of chemparse (https://pypi.org/project/chemparse/) is allowed. Examples: “SiO2”, “CaMgSi2O5”, “(Li2O)1(SiO2)2”, “C1.5O3”. See the chemparse documentation for more information.
List of floats: A simple list of floats can represent a chemical substance. To represent SiO2, you could write [1, 2, 0], where the first element represents the amount of silicon, the second the amount of oxygen, and the third the amount of lithium. Another way to represent this substance would be the list [1], where the only element present is silica itself. As you can see, it is up to the user to know which element of the list is associated with each chemical substance.
List of lists of floats: similar to the above, here the user can store more than one chemical substance in the same variable. To represent both SiO2 and Li2O, you can write “substances = [[1, 2, 0], [0, 1, 2]]”. Note that the first index of substances is associated with a single chemical substance, so substances[0] contains information about SiO2 and substances[1] contains information about Li2O. The second substance index is associated with chemical elements or molecules. In this case, the values stored in substances[0][0] and substances[1][0] store the amount of silicon that SiO2 and Li2O have.
Numpy array: 1D numpy arrays follow the same logic as a list of floats, and 2D numpy arrays follow the same logic as a list of lists of floats.
Dictionary with string keys and float values: in this case, the keys are the chemical elements or molecules and the corresponding values are the amount of these elements or molecules. SiO2 can be written as {‘Si’: 1, ‘O’: 2} or {‘SiO2’: 1}.
Dictionary with string keys and list of float values: similar to the above, but the user can store more than one chemical substance in the same dictionary. To store SiO2 and Li2O in the same dictionary, you can write {‘Si’: [1, 0], ‘O’: [2, 1], ‘Li’: [0, 2]}. Note that the first element of each list stores information about one substance (SiO2) and the second element stores information about the other (Li2O).
Dictionary with string keys and numpy array values: behaves the same as the above.
Pandas DataFrame: each row of the DataFrame represents a chemical substance. The columns represent the chemical elements or molecules that make up the substance. Elements and molecules that are not present must be zero. Only information related to the chemical composition of the substances can be present in the DataFrame.
- class glasspy.chemistry.types.ChemArray(chem_composition: ndarray, chem_columns: List[str])
Bases:
ndarray
Numpy array for storing chemical composition data.
- ChemArrays must obey three rules:
each row of the array is a chemical substance;
each column of the array represents a chemical element or molecule;
ChemArrays must be 2D arrays.
For example, suppose you have a ChemArray x. The value stored in x[i][j] is the amount of element/molecule j that substance i has. It is up to the user to define what this amount means. Maybe it’s the mole fraction of the element/molecule j; maybe it’s the weight percentage of the element/molecule j. It doesn’t matter what it means, as long as this definition is the same for all values stored in this ChemArray.
Note that even if your array contains only one substance, it must still satisfy condition iii, that is, it must still be a 2D array (in this case, with only one row). The recommended way to create ChemArrays is to use the to_array or to_element_array function from GlassPy’s chemistry.convert submodule.
- Parameters:
chem_composition – A 2D array. Each row is a chemical element. See the docstring of the to_array function for more information.
chem_columns – A list of strings containing the chemical substance associated with each column in the array.
Notes
Code based on https://numpy.org/doc/stable/user/basics.subclassing.html#slightly-more-realistic-example-attribute-added-to-existing-array
Module contents
- class glasspy.chemistry.ChemArray(chem_composition: ndarray, chem_columns: List[str])
Bases:
ndarray
Numpy array for storing chemical composition data.
- ChemArrays must obey three rules:
each row of the array is a chemical substance;
each column of the array represents a chemical element or molecule;
ChemArrays must be 2D arrays.
For example, suppose you have a ChemArray x. The value stored in x[i][j] is the amount of element/molecule j that substance i has. It is up to the user to define what this amount means. Maybe it’s the mole fraction of the element/molecule j; maybe it’s the weight percentage of the element/molecule j. It doesn’t matter what it means, as long as this definition is the same for all values stored in this ChemArray.
Note that even if your array contains only one substance, it must still satisfy condition iii, that is, it must still be a 2D array (in this case, with only one row). The recommended way to create ChemArrays is to use the to_array or to_element_array function from GlassPy’s chemistry.convert submodule.
- Parameters:
chem_composition – A 2D array. Each row is a chemical element. See the docstring of the to_array function for more information.
chem_columns – A list of strings containing the chemical substance associated with each column in the array.
Notes
Code based on https://numpy.org/doc/stable/user/basics.subclassing.html#slightly-more-realistic-example-attribute-added-to-existing-array
- glasspy.chemistry.mol_to_wt(x: ndarray | ChemArray, input_cols: List[str], rescale_to_sum: float | int | bool = False) ndarray | ChemArray
Convert an array from mol% to weight%.
- Parameters:
x – A 2D array. Each row is a chemical substance. See the docstring of the function to_array for more information.
input_cols – List of strings representing the chemical entities related to each column of x.
rescale_to_sum – A positive number representing the total sum of each chemical substance. If False then the same input array is returned.
- Returns:
A 2D array. Each row is a chemical substance.
- glasspy.chemistry.physchem_featurizer(x: str | List[float] | List[List[float]] | ndarray | Dict[str, float] | Dict[str, List[float]] | Dict[str, ndarray] | DataFrame | ChemArray, input_cols: List[str] = [], elemental_features: List[str] = [], weighted_features: List[Tuple[str, str]] = [], absolute_features: List[Tuple[str, str]] = [], rescale_to_sum: float | int | bool = 1, sep: str = '|', check_invalid: bool = True, order: str = 'ewa') Tuple[ndarray, List[str]]
Extract features from a chemical object.
For a list of all possible features that can be extracted, check the variable all_features.
- Parameters:
x – Any composition like object.
input_cols – List of strings representing the chemical entities related to each column of x. Necessary only when x is a list or array, ignored otherwise.
absolute_features – List of chemical elements that will be part of the features.
absolute_features – List of tuples containing the name of the feature to be extracted and the aggregator function. Features computed from this list are absolute.
weighted_features – List of tuples containing the name of the feature to be extracted and the aggregator function. Features computed from this list are weighted.
rescale_to_sum – A positive number representing the total sum of each chemical substance. If False then the same input array is returned.
sep – String used to separate the information of the name of each extracted feature.
check_invalid – Checks if there are invalid features that cannot be computed. Invalid features are those with missing values in the desired chemical domain. The function still works even with invalid features. However, it is not recommended to use it in this case. Only disable this check if you are sure that no invalid features exist in your chemical domain.
order – String containing the order of the features in the final array. Use the letter e for elemental features, w for weighted features, and a for absolute features.
- Returns:
A 2D array. Each row is a chemical substance. feature_columns;
A list of strings containing the name of the extracted chemical feature. Strings starting with “A” are absolute features and strings starting with “W” are weighted.
- Return type:
features
- Raises:
AssertionError – Raised when rescale_to_sum is negative.
ValueError – Raised when the input composition has chemical elements that cannot be used to extract features.
ValueError – Raised when invalid features are present and check_invalid is True.
- glasspy.chemistry.rescale_array(x: ndarray | ChemArray, rescale_to_sum: float | int | bool = 100) ndarray | ChemArray
Rescale all rows of an array to have the same sum.
This function does nothing if rescale_to_sum is False.
- Parameters:
x – A 2D array. Each row is a chemical substance. See the docstring of the function to_array for more information.
rescale_to_sum – A positive number representing the total sum of each chemical substance. If False then the same input array is returned.
- Returns:
A 2D array. Each row is a chemical substance.
- Raises:
AssertionError – Raised when rescale_to_sum is negative.
- glasspy.chemistry.to_array(x: str | List[float] | List[List[float]] | ndarray | Dict[str, float] | Dict[str, List[float]] | Dict[str, ndarray] | DataFrame | ChemArray, input_cols: List[str] = [], output_cols: str | List[str] = 'default', rescale_to_sum: float | int | bool = False) ChemArray
Convert the input object to an array.
Most of the operations in this module receive an array as the argument. This design choice was made because numpy arrays are fast. These arrays may be called “chemical arrays”, but they are still instances of numpy array. Chemical arrays must follow three rules: i. each row of the array is one chemical substance; ii. each column of the array represents a chemical element or chemical
molecule;
chemical arrays must be 2D arrays.
Say, for example, that you have a chemical array x. The value stored in x[i][j] is the amount of the element/molecule j that the substance i has. It is up to the user to define what this amount means. Perhaps it is the mole fraction of the element/molecule j; perhaps it is the weight percentage of the element/molecule j. It doesn’t matter what it means, as long as this definition is the same for all values stored in this chemical array x. Check the function wt_to_mol and mol_to_wt to easily convert array from wt% to mol% and vice versa. Note: even if your array hold only one substance, it must still meet condition iii, that is: it must still be a 2D array (in this case with only one row).
- Parameters:
x – Any composition like object.
input_cols – List of strings representing the chemical entities related to each column of x. Necessary only when x is a list or array, ignored otherwise.
output_cols – List of strings of chemical compounds or chemical elements. The columns of the output array will be arranged in this order. If ‘default’ is passed to this argument, then the output array will have the same order as the input. Caution: this function does not convert compounds to chemical elements, see to_element_array if this is what you are looking for.
rescale_to_sum – A positive number representing the total sum of each chemical substance. If False then the same input array is returned.
- Returns:
The converted ChemArray.
- Raises:
ValueError – Raised when x is a list or an array that is not 1D or 2D.
- glasspy.chemistry.to_element_array(x: str | List[float] | List[List[float]] | ndarray | Dict[str, float] | Dict[str, List[float]] | Dict[str, ndarray] | DataFrame | ChemArray, input_cols: List[str] = [], output_element_cols: str | List[str] = 'default', rescale_to_sum: float | int | bool = False) ChemArray
Convert x to an element array.
- Parameters:
x – Any composition like object.
input_cols – List of strings representing the chemical entities related to each column of x. Necessary only when x is a list or array, ignored otherwise.
output_element_cols – List of strings of chemical element symbols. The columns of the output array will be arranged in this order. If ‘all’ is passed to this argument, then the output array will be a sequence from hydrogen to plutonium sorted by chemical number. If ‘default’ is passed, then the output array will have only the chemical elements that are present in x, sorted alphabetically.
rescale_to_sum – A positive number representing the total sum of each chemical substance. If False then the same input array is returned.
- Returns:
The converted ChemArray.
- Raises:
AssertionError – Raised when input_cols has lenght of zero and x is a list or an array
AssertionError – Raised when the lenght of input_cols is different than the lenght of x along the column axis.
- glasspy.chemistry.wt_to_mol(x: ndarray | ChemArray, input_cols: List[str], rescale_to_sum: float | int | bool = False) ndarray | ChemArray
Convert an array from weight% to mol%.
- Parameters:
x – A 2D array. Each row is a chemical substance. See the docstring of the function to_array for more information.
input_cols – List of strings representing the chemical entities related to each column of x.
rescale_to_sum – A positive number representing the total sum of each chemical substance. If False then the same input array is returned.
- Returns:
A 2D array. Each row is a chemical substance.