homelette.organization
The homelette.organization
submodule contains classes for organizing
workflows.
Task
is an object orchestrating model generation and evaluation.
Model
is an object used for storing information about generated
models.
Tutorials
For an introduction to homelette’s workflow, Tutorial 1 is useful. Assembling custom pipelines is discussed in Tutorial 7.
Classes
The following classes are part of this submodule:
-
class
homelette.organization.
Task
(task_name: str, target: str, alignment: Type[Alignment], task_directory: str = None, overwrite: bool = False) Class for directing modelling and evaluation.
It is designed for the modelling of one target sequence from one or multiple templates.
If an already existing folder with models is specified, the Task object will load those models in automatically. In this case, it can also be used exclusively for evaluation purposes.
- Parameters
task_name (str) – The name of the task
target (str) – The identifier of the protein to model
alignment (Alignment) – The alignment object that will be used for modelling
task_directory (str, optional) – The directory that will be used for this modelling task (default is creating a new one based on the task_name)
overwrite (bool, optional) – Boolean value determining if an already existing task_directory should be overwriten. If a directory already exists for a given task_name or task_directory, this will determine whether the directory and all its contents will be overwritten (True), or whether the contained models will be imported (False) (default is False)
- Variables
task_name (str) – The name of the task
task_directory (str) – The directory that will be used for this modelling task (default is to use the task_name)
target (str) – The identifier of the protein to model
alignment (Alignment) – The alignment object that will be used for modelling
models (list) – List of models generated or imported by this task
routines (list) – List of modelling routines executed by this task
- Returns
- Return type
None
-
execute_routine
(tag: str, routine: Type[routines.Routine], templates: Iterable, template_location: str = '.', **kwargs) → None Generates homology models using a specified modelling routine
- Parameters
tag (str) – The identifier associated with this combination of routine and template(s). Has to be unique between all routines executed by the same task object
routine (Routine) – The routine object used to generate the models
templates (list) – The iterable containing the identifier(s) of the template(s) used for model generation
template_location (str, optional) – The location of the template PDB files. They should be named according to their identifiers in the alignment (i.e. for a sequence named “1WXN” to be used as a template, it is expected that there will be a PDB file named “1WXN.pdb” in the specified template location (default is current working directory)
**kwargs – Named parameters passed directly on to the Routine object when the modelling is performed. Please check the documentation in order to make sure that the parameters passed on are available with the Routine object you intend to use
- Returns
- Return type
None
-
evaluate_models
(*args: Type[evaluation.Evaluation], n_threads: int = 1) → None Evaluates models using one or multiple evaluation metrics
- Parameters
*args (Evaluation) – Evaluation objects that will be applied to the models
n_threads (int, optional) – Number of threads used for model evaluation (default is 1, which deactivates parallelization)
- Returns
- Return type
None
-
get_evaluation
() → pandas.DataFrame Return evaluation for all models as pandas dataframe.
- Returns
Dataframe containing all model evaluation
- Return type
pd.DataFrame
-
class
homelette.organization.
Model
(model_file: str, tag: str, routine: str) Interface used to interact with created protein structure models.
- Parameters
model_file (str) – The file location of the PDB file for this model
tag (str) – The tag that was used when generating this model (see
Task.execute_routine
for more details)routine (str) – The name of the routine that was used to generate this model
- Variables
model_file (str) – The file location of the PDB file for this model
tag (str) – The tag that was used when generating this model (see Task.execute_routine for more details)
routine (str) – The name of the routine that was used to generate this model
info (dict) – Dictionary that can be used to store metadata about the model (i.e. for some evaluation metrics)
- Returns
- Return type
None
-
parse_pdb
() → pandas.DataFrame Parses ATOM and HETATM records in PDB file to pandas dataframe Useful for giving some evaluations methods access to data from the PDB file.
- Returns
- Return type
pd.DataFrame
Notes
Information is extracted according to the PDB file specification (version 3.30) and columns are named accordingly. See https://www.wwpdb.org/documentation/file-format for more information.
-
get_sequence
() → str Retrieve the 1-letter amino acid sequence of the PDB file associated with the Model object.
- Returns
Amino acid sequence
- Return type
str
-
rename
(new_name: str) → None Rename the PDB file associated with the Model object.
- Parameters
new_name (str) – New name of PDB file
- Returns
- Return type
None