Tutorial 2: Modelling

[1]:
import os

import homelette as hm

Introduction

Welcome to the second tutorial for homelette. In this tutorial, we will further explore the already implemented method to generate homology models.

Currently, the following software packages for generating homology models have been integrated in the homelette homology modelling interface:

  • modeller: A robust package for homology modelling with a long history which is widely used [1,2]

  • altmod: A modification to the standard modeller modelling procedure that has been reported to increase the quality of models [3]

  • ProMod3: The modelling engine behind the popular SwissModel web platform [4,5]

Specifically, the following routines are implemented in homelette. For more details on the individual routines, please check the documentation or their respective docstring.

  • routines.Routine_automodel_default

  • routines.Routine_automodel_slow

  • routines.Routine_altmod_default

  • routines.Routine_altmod_slow

  • routines.Routine_promod3

In this example, we will generate models for the RBD domain of ARAF. ARAF is a RAF kinase important in MAPK signalling. As a template, we will choose a close relative of ARAF called BRAF, specifically the structure with the PDB code 3NY5.

All files necessary for running this tutorial are already prepared and deposited in the following directory: homelette/example/data/. If you execute this tutorial from homelette/example/, you don’t have to adapt any of the paths.

homelette comes with an extensive documentation. You can either check out our online documentation, compile a local version of the documentation in homelette/docs/ with sphinx or use the help() function in Python.

Alignment

For this tutorial, we will use the same alignment and template as for Tutorial 1.

[2]:
# read in the alignment
aln = hm.Alignment('data/single/aln_1.fasta_aln')

# print to screen to check alignment
aln.print_clustal(line_wrap=70)
ARAF        ---GTVKVYLPNKQRTVVTVRDGMSVYDSLDKALKVRGLNQDCCVVYRLIKGRKTVTAWDTAIAPLDGEE
3NY5        HQKPIVRVFLPNKQRTVVPARCGVTVRDSLKKAL--RGLIPECCAVYRIQ---KKPIGWDTDISWLTGEE


ARAF        LIVEVL------
3NY5        LHVEVLENVPLT


[3]:
# annotate the alignment
aln.get_sequence('ARAF').annotate(seq_type = 'sequence')
aln.get_sequence('3NY5').annotate(seq_type = 'structure',
                              pdb_code = '3NY5',
                              begin_res = '1',
                              begin_chain = 'A',
                              end_res = '81',
                              end_chain = 'A')

Model Generation using routines

The building blocks in homelette that take care of model generation are called Routines. There is a number of pre-defined routines, and it is also possible to construct custom routines (see Tutorial 4). Every routine in homelette expects a number of identical arguments, while some can have a few optional ones as well.

[4]:
?hm.routines.Routine_automodel_default
Init signature:
hm.routines.Routine_automodel_default(
    alignment: Type[ForwardRef('Alignment')],
    target: str,
    templates: Iterable,
    tag: str,
    n_threads: int = 1,
    n_models: int = 1,
) -> None
Docstring:
Class for performing homology modelling using the automodel class from
modeller with a default parameter set.

Parameters
----------
alignment : Alignment
    The alignment object that will be used for modelling
target : str
    The identifier of the protein to model
templates : Iterable
    The iterable containing the identifier(s) of the template(s) used
    for the modelling
tag : str
    The identifier associated with a specific execution of the routine
n_threads : int
    Number of threads used in model generation (default 1)
n_models : int
    Number of models generated (default 1)

Attributes
----------
alignment : Alignment
    The alignment object that will be used for modelling
target : str
    The identifier of the protein to model
templates : Iterable
    The iterable containing the identifier(s) of the template(s) used for
    the modelling
tag : str
    The identifier associated with a specific execution of the routine
n_threads : int
    Number of threads used for model generation
n_models : int
    Number of models generated
routine : str
    The identifier associated with a specific routine
models : list
    List of models generated by the execution of this routine

Raises
------
ImportError
    Unable to import dependencies

Notes
-----
The following modelling parameters can be set when initializing this
Routine object:

* n_models
* n_threads

The following modelling parameters are set for this class:

+-----------------------+---------------------------------------+
| modelling             | value                                 |
| parameter             |                                       |
+=======================+=======================================+
| model_class           | modeller.automodel.automodel          |
+-----------------------+---------------------------------------+
| library_schedule      | modeller.automodel.autosched.normal   |
+-----------------------+---------------------------------------+
| md_level              | modeller.automodel.refine.very_fast   |
+-----------------------+---------------------------------------+
| max_var_iterations    | 200                                   |
+-----------------------+---------------------------------------+
| repeat_optmization    | 1                                     |
+-----------------------+---------------------------------------+
File:           /usr/local/src/homelette/homelette/routines.py
Type:           type
Subclasses:

The following arguments are required for all pre-defined routines:

  • alignment: The alignment object used for modelling.

  • target: The identifier of the target sequence in the alignment object

  • templates: An iterable containing the identifier(s) of the templates for this modelling routine. homelette expects that templates are uniquely identified by their identifier in the alignment and in the template PDB file(s). Routines based on modeller work with one or multiple templates, whereas Routine_promod3 only accepts a single template per run.

  • tag: Each executed routine is given a tag which will be used to name the generated models.

In addition, pre-defined routines expect the template PDBs to be present in the current working directory.

The routine Routine_automodel_default has two optional arguments:

  • n_models: the number of models that should be produced on this run, as routines based on modeller are able to produce an arbitary number of models.

  • n_threads: enable mulit-threading for the execution of this routine. For more information on parallelization in homelette, please check out Tutorial 5.


While it is generally recommended to execute routines using Task objects (see next section), it is also possible to execute them directly. For doing this, since the template file has to be in the curent working directory, we quickly change working directory to a prepared directory where we can execute the routine (this code assumes that your working directory is homelette/examples.

[5]:
# change directory
os.chdir('data/single')
# print content of directory to screen
print('Files before modelling:\n' + ' '.join(os.listdir()) + '\n\n')

# perform modelling
routine = hm.routines.Routine_automodel_default(
    alignment=aln,
    target='ARAF',
    templates=['3NY5'],
    tag='model')
routine.generate_models()

print('Files after modelling:\n' + ' '.join(os.listdir()) + '\n')

# remove model
os.remove('model_1.pdb')

# change back to tutorial directory
os.chdir('../..')
Files before modelling:
3NY5.pdb aln_1.fasta_aln 4G0N.pdb


Files after modelling:
model_1.pdb 3NY5.pdb aln_1.fasta_aln 4G0N.pdb

Model Generation using Task and routines

homelette has Task objects that allow for easier use of Routines and Evaluations (see also Tutorial 3). Task objects help to direct and organize modelling pipelines. It is strongly recommended to use Task objects to execute routines and evaluations.

For more information on Task objects, please check out the documentation or Tutorial 1.

[6]:
# set up task object
t = hm.Task(
    task_name = 'Tutorial2',
    target = 'ARAF',
    alignment = aln,
    overwrite = True)

Using the Task object, we can now begin to generate our models with different routines using the Task.execute_routine method.

[7]:
?hm.Task.execute_routine
Signature:
hm.Task.execute_routine(
    self,
    tag: str,
    routine: Type[ForwardRef('routines.Routine')],
    templates: Iterable,
    template_location: str = '.',
    **kwargs,
) -> None
Docstring:
Generates homology models using a specified modelling routine

Parameters
----------
tag : str
    The identifier associated with this combination of routine and
    template(s). Has to be unique between all routines executed by the
    same task object
routine : Routine
    The routine object used to generate the models
templates : list
    The iterable containing the identifier(s) of the template(s) used
    for model generation
template_location : str, optional
    The location of the template PDB files. They should be named
    according to their identifiers in the alignment (i.e. for a
    sequence named "1WXN" to be used as a template, it is expected that
    there will be a PDB file named "1WXN.pdb" in the specified template
    location (default is current working directory)
**kwargs
    Named parameters passed directly on to the Routine object when the
    modelling is performed. Please check the documentation in order to
    make sure that the parameters passed on are available with the
    Routine object you intend to use

Returns
-------
None
File:      /usr/local/src/homelette/homelette/organization.py
Type:      function

As we can see, Task.execute_routine expects a number of arguments from the user:

  • tag: Each executed routine is given a tag which will be used to name the generated models. This is useful for differentiating between different routines executed by the same Task, for example if different templates are used.

  • routine: Here the user can set which routine will be used for generating the homology model(s), arguably the most important setting.

  • templates: An iterable containing the identifier(s) of the templates for this modelling routine. homelette expects that templates are uniquely identified by their identifier(s) in the alignment and in the template location.

  • template_location: The folder where the PDB file(s) used as template(s) are found.

We are generating some models with the pre-defined routines of homelette:

[8]:
# model generation with modeller
t.execute_routine(
    tag = 'example_modeller',
    routine = hm.routines.Routine_automodel_default,
    templates = ['3NY5'],
    template_location = './data/single')

# model generation with altmod
t.execute_routine(
    tag = 'example_altmod',
    routine = hm.routines.Routine_altmod_default,
    templates = ['3NY5'],
    template_location = './data/single')

# model generation with promod3
# t.execute_routine(
#     tag = 'example_promod3',
#     routine = hm.routines.Routine_promod3,
#     templates = ['3NY5'],
#     template_location = './data/')

As mentioned before, some modelling routines have optional arguments, such as n_models for Routine_autmodel_default. We can pass these optional arguments to Task.execute_routine which passes them on the routine selected:

[9]:
# multiple model generation with altmod
t.execute_routine(
    tag = 'example_modeller_more_models',
    routine = hm.routines.Routine_automodel_default,
    templates = ['3NY5'],
    template_location = './data/single',
    n_models = 10)

Models generated using Task objects are stored as Model objects in the Task:

[10]:
t.models
[10]:
[<homelette.organization.Model at 0x7ff61a8b4220>,
 <homelette.organization.Model at 0x7ff5e4ed5ee0>,
 <homelette.organization.Model at 0x7ff608b55820>,
 <homelette.organization.Model at 0x7ff61a8b4040>,
 <homelette.organization.Model at 0x7ff608b55d00>,
 <homelette.organization.Model at 0x7ff5e4e4ba00>,
 <homelette.organization.Model at 0x7ff5e4e4b520>,
 <homelette.organization.Model at 0x7ff5e4e4b880>,
 <homelette.organization.Model at 0x7ff61800e250>,
 <homelette.organization.Model at 0x7ff5e4e4b8b0>,
 <homelette.organization.Model at 0x7ff5e4da2100>,
 <homelette.organization.Model at 0x7ff5e4da2610>]

In conclusion, we have learned how to use a single Task object to generate models with different modelling routines. We have also learned how to pass optional arguments on to the executed routines.

In this example, the target, the alignment and the templates were kept identical. Varying the templates would be straight forward, under the condition that other templates are included in the alignment. For varying alignments and targets, new Task objects would need to be created. This is a design choice that is meant to encourage users to try out different routines or templates/template combinations. It is recommended when using different routines or multiple templates to indicate this using the tag argument of Task.execute_routine (i.e. tag='automodel_3NY5'). Similarly, using a single Task object for multiple targets or alignments is discouraged and we recommend to utilize multiple Task objects for these modelling approaches.

Further Reading

You are now familiar with model generation in homelette.

Please note that there are other tutorials, which will teach you more about how to use homelette:

  • Tutorial 1: Learn about the basics of homelette.

  • Tutorial 3: Learn about the evaluation metrics available with homelette.

  • Tutorial 4: Learn about extending homelette’s functionality by defining your own modelling routines and evaluation metrics.

  • Tutorial 5: Learn about how to use parallelization in order to generate and evaluate models more efficiently.

  • Tutorial 6: Learn about modelling protein complexes.

  • Tutorial 7: Learn about assembling custom pipelines.

  • Tutorial 8: Learn about automated template identification, alignment generation and template processing.

References

[1] Šali, A., & Blundell, T. L. (1993). Comparative protein modelling by satisfaction of spatial restraints. Journal of Molecular Biology, 234(3), 779–815. https://doi.org/10.1006/jmbi.1993.1626

[2] Webb, B., & Sali, A. (2016). Comparative Protein Structure Modeling Using MODELLER. Current Protocols in Bioinformatics, 54(1), 5.6.1-5.6.37. https://doi.org/10.1002/cpbi.3

[3] Janson, G., Grottesi, A., Pietrosanto, M., Ausiello, G., Guarguaglini, G., & Paiardini, A. (2019). Revisiting the “satisfaction of spatial restraints” approach of MODELLER for protein homology modeling. PLoS Computational Biology, 15(12), e1007219. https://doi.org/10.1371/journal.pcbi.1007219

[4] Biasini, M., Schmidt, T., Bienert, S., Mariani, V., Studer, G., Haas, J., Johner, N., Schenk, A. D., Philippsen, A., & Schwede, T. (2013). OpenStructure: An integrated software framework for computational structural biology. Acta Crystallographica Section D: Biological Crystallography, 69(5), 701–709. https://doi.org/10.1107/S0907444913007051

[5] Studer, G., Tauriello, G., Bienert, S., Biasini, M., Johner, N., & Schwede, T. (2021). ProMod3—A versatile homology modelling toolbox. PLOS Computational Biology, 17(1), e1008667. https://doi.org/10.1371/JOURNAL.PCBI.1008667

Session Info

[11]:
# session info
import session_info
session_info.show(html = False, dependencies = True)
-----
homelette           1.3
session_info        1.0.0
-----
PIL                         7.0.0
altmod                      NA
anyio                       NA
attr                        19.3.0
babel                       2.9.1
backcall                    0.2.0
certifi                     2021.10.08
chardet                     3.0.4
charset_normalizer          2.0.8
cycler                      0.10.0
cython_runtime              NA
dateutil                    2.7.3
debugpy                     1.5.1
decorator                   4.4.2
entrypoints                 0.3
idna                        3.3
importlib_resources         NA
ipykernel                   6.5.1
ipython_genutils            0.2.0
jedi                        0.18.1
jinja2                      3.0.3
json5                       NA
jsonschema                  4.2.1
jupyter_server              1.12.1
jupyterlab_server           2.8.2
kiwisolver                  1.0.1
markupsafe                  2.0.1
matplotlib                  3.1.2
modeller                    10.1
mpl_toolkits                NA
nbclassic                   NA
nbformat                    5.1.3
numpy                       1.17.4
ost                         2.2.0
packaging                   20.3
pandas                      0.25.3
parso                       0.8.2
pexpect                     4.8.0
pickleshare                 0.7.5
pkg_resources               NA
prometheus_client           NA
promod3                     3.2.0
prompt_toolkit              3.0.23
ptyprocess                  0.7.0
pvectorc                    NA
pydev_ipython               NA
pydevconsole                NA
pydevd                      2.6.0
pydevd_concurrency_analyser NA
pydevd_file_utils           NA
pydevd_plugins              NA
pydevd_tracing              NA
pygments                    2.10.0
pyparsing                   2.4.6
pyrsistent                  NA
pytz                        2019.3
qmean                       NA
requests                    2.26.0
send2trash                  NA
sitecustomize               NA
six                         1.14.0
sniffio                     1.2.0
storemagic                  NA
swig_runtime_data4          NA
terminado                   0.12.1
tornado                     6.1
traitlets                   5.1.1
urllib3                     1.26.7
wcwidth                     NA
websocket                   1.2.1
zipp                        NA
zmq                         22.3.0
-----
IPython             7.30.0
jupyter_client      7.1.0
jupyter_core        4.9.1
jupyterlab          3.2.4
notebook            6.4.6
-----
Python 3.8.10 (default, Jun  2 2021, 10:49:15) [GCC 9.4.0]
Linux-4.15.0-162-generic-x86_64-with-glibc2.29
-----
Session information updated at 2021-11-29 18:56