Tutorial 4: Extending homelette

[1]:
import homelette as hm

import contextlib
import glob
import os.path
import sys

from modeller import environ, Selection
from modeller.automodel import LoopModel

Introduction

Welcome to the forth tutorial on homelette. In this tutorial, we will discuss how to implement custom building blocks, either for generating or for evaluating models. These custom building blocks can be integrated in homology modelling pipelines.

This is probably the most important tutorial in the series. After this tutorial, you will be able to implement your own routines into the homelette framework, which gives you complete control over the homology modelling pipelines you want to establish!

Please note that we encourage users to share custom routines and evaluation metrics if they think they might be useful for the community. In our online documentation, there is a dedicated section for these contributions. If you are interested, please contact us on GitHub or via email.

Alignment

For this tutorial, we are using the same alignment as in Tutorial 1. Identical to Tutorial 1, the alignment is imported and annotated and a Task object is created.

[2]:
# read in the alignment
aln = hm.Alignment('data/single/aln_1.fasta_aln')

# annotate the alignment
aln.get_sequence('ARAF').annotate(
    seq_type = 'sequence')
aln.get_sequence('3NY5').annotate(
    seq_type = 'structure',
    pdb_code = '3NY5',
    begin_res = '1',
    begin_chain = 'A',
    end_res = '81',
    end_chain = 'A')

# initialize task object
t = hm.Task(
    task_name = 'Tutorial4',
    target = 'ARAF',
    alignment = aln,
    overwrite = True)

Defining custom routines

As an example for a custom routine, we will implement a LoopModel class from modeller [1,2] loosely following this tutorial on the modeller web page (in the section Loop Refining).

[3]:
class Routine_loopmodel(hm.routines.Routine): # (1)
    '''
    Custom routine for modeller loop modelling.
    '''
    def __init__(self, alignment, target, templates, tag, n_models=1, n_loop_models=1): # (2)
        hm.routines.Routine.__init__(self, alignment, target, templates, tag)
        self.routine = 'loopmodel' # string identifier of routine

        self.n_models = n_models
        self.n_loop_models = n_loop_models

    def generate_models(self): # (3)
        # (4) process alignment
        self.alignment.select_sequences([self.target] + self.templates)
        self.alignment.remove_redundant_gaps()
        # write alignemnt to temporary file
        self.alignment.write_pir('.tmp.pir')

        # (5) define custom loop model class
        class MyLoop(LoopModel):
            # set residues that will be refined by loop modelling
            def select_loop_atoms(self):
                return Selection(self.residue_range('18:A', '22:A'))

        with contextlib.redirect_stdout(None): # (6) suppress modeller output to stdout
            # (7) set up modeller environment
            env = environ()
            env.io.hetatm = True

            # initialize model
            m = MyLoop(env,
                       alnfile='.tmp.pir',
                       knowns=self.templates,
                       sequence=self.target)

            # set modelling parameters
            m.blank_single_chain = False
            m.starting_model = 1
            m.ending_model = self.n_models
            m.loop.starting_model = 1
            m.loop.ending_model = self.n_loop_models

            # make models
            m.make()

        # (8) capture output
        for pdb in glob.glob('{}.BL*.pdb'.format(self.target)):
            self.models.append(
                hm.Model(os.path.realpath(os.path.expanduser(pdb)),
                         self.tag, self.routine))

        # (9) rename files with method from hm.routines.Routine
        self._rename_models()

        # (10) clean up
        self._remove_files(
            '{}.B99*.pdb'.format(self.target),
            '{}.D00*'.format(self.target),
            '{}.DL*'.format(self.target),
            '{}.IL*'.format(self.target),
            '{}.ini'.format(self.target),
            '{}.lrsr'.format(self.target),
            '{}.rsr'.format(self.target),
            '{}.sch'.format(self.target),
            '.tmp*')

The lines of code in the definition of the custom routine above that are marked with numbers get special comments here:

  1. Our custom routine in this example inherits from a parent class Routine defined in homelette. This is not strictly necessary, however, the parent class has a few useful functions already implemented that we will make use of (see steps 2, 9, 10)

  2. Every routine needs to accept these arguments: alignment, target, templates, tag. In our case, we just hand them through to the parent method Routine.__init__ that saves them as attributes, as well as introduces the attribute self.models where models will be deposited after generation.

  3. Every routine needs a generate_models method. Usually, functionality for, you guessed it, model generation is packed in there.

  4. modeller requires the aligment as a file in PIR format. The following few lines of code format the alignment and then produce the required file.

  5. The following lines follow closely the modeller tutorial for loop modelling. This part implements a custom LoopModel class that defines a specific set of residue to be considered for loop modelling.

  6. modeller writes a lot of output to stdout, and using contextlib is a way to suppress this output. If you want to see all the output from modeller, either delete the with statement or write with contextlib.redirect_stdout(sys.stdout): instead.

  7. The following lines follow closely the modeller tutorial for loop modelling. This part initializes the model and generates the models requested.

  8. The final models generated will be called ARAF.BL00010001.pdb and so on. These lines of code find these PDB files and add them to the Routine_loopmodel.models list as Models. After execution by a Task objects, Model objects in this list will be added to the Task.models list.

  9. Models generated will be renamed according to the tag given using the parent class method Routine._rename_models.

  10. Temporary files from modeller as well as the temporary alignment file are removed from the folder using the parent class method Routine._remove_files.


Now, after implementing the routine, let’s try it out in practice. As explained in Tutorial 2, we will be using the Task.execute_routine interface for that:

[4]:
# perform modelling
t.execute_routine(
    tag = 'custom_loop',
    routine = Routine_loopmodel,
    templates = ['3NY5'],
    template_location = './data/single',
    n_models = 2,
    n_loop_models = 2)
[5]:
# check generated models
t.models
[5]:
[<homelette.organization.Model at 0x7f865dabe910>,
 <homelette.organization.Model at 0x7f865dabeac0>,
 <homelette.organization.Model at 0x7f86935a3640>,
 <homelette.organization.Model at 0x7f86935a32b0>]

In practice, a valid routine only needs to adhere to a small number of formal criteria to fit in the homelette framework:

  • It needs to be an object.

  • It needs to have an __init__ method that can handle the named arguments alignment, target, templates and tag.

  • It needs a generate_models method.

  • It needs an attribute models in which generated models are stored as Model objects in list.

Any object that satisfies these criteria can be used in the framework.

Defining custom evaluations

As an example for a custom evaluation, we will implement a sample evaluation that counts the number of residues in the models.

[6]:
class Evaluation_countresidues():
    '''
    Custom evaluation: counting CA atoms
    '''
    def __init__(self, model, quiet=True): # (1)
        self.model = model
        self.output = dict()
        # (2) perform evaluation
        self.evaluate()
        # (3) update model.evaluation
        self.model.evaluation.update(self.output)

    def evaluate(self): # (4)
        # (5) parse model pdb
        pdb = self.model.parse_pdb()

        # count number of CA atoms in PDB
        n_residues = pdb['name'].eq('CA').sum()

        # append to output
        self.output['n_residues'] = n_residues

The lines of code marked with numbers in the definiton of the custom evaluation get special comments here:

  1. The __init__ function takes exactly 2 arguments: model and quiet. quiet is a boolean value indicating whether output to stdout should be suppressed (not applicable in this case).

  2. All evaluation metrics are executed upon initialization.

  3. The custom_evaluation.output dictionary is merged with the Model.evaluation dictionary to make the output of our evaluation metrics available to the model.

  4. Here we define the function where the actual evaluation takes place.

  5. For the actual evaluation, we make use of the Model.parse_pdb method, which parses the PDB file associated to a specific model object to a pandas data frame. This can be useful for a number of evaluations (access residues, coordinates, etc.)

Note

If more arguments are required for a custom evaluation, we recomment to store them as attributes in the Model objects and then access these attributes while running the evaluation.

Now we apply our custom evaluation to our previously generated models using the Task.evaluate_models interface (for more details, see Tutorial 3):

[7]:
t.evaluate_models(Evaluation_countresidues)
t.get_evaluation()
[7]:
model tag routine n_residues
0 custom_loop_1.pdb custom_loop loopmodel 73
1 custom_loop_2.pdb custom_loop loopmodel 73
2 custom_loop_3.pdb custom_loop loopmodel 73
3 custom_loop_4.pdb custom_loop loopmodel 73

In practice, the formal requirements for a custom evaluation are the following:

  • It has to be an object.

  • __init__ has the two arguments model and quiet. More arguments would work in conjunction with Task.evaluate_models only if defaults are set and used. We recommend storing more arguments as attributes in the Model object and then accessing them during the evaluation.

  • It executes evaluation on initialization.

  • On finishing the evaluation, it updates the Model.evaluation dictionary with the results of the evaluation.

Further reading

Congratulations on finishing the tutorial on extending homelette.

Please take again notice that on our online documentation, there is a page collecting user-submitted custom routines and evaluation metrics. User are encouraged to share if they implemented something which they might think could be useful for the community. If you are interested, please contact us on GitHub or via email.

There are more tutorials which might interest you:

  • Tutorial 1: Learn about the basics of homelette.

  • Tutorial 2: Learn more about already implemented routines for homology modelling.

  • Tutorial 3: Learn about the evaluation metrics available with homelette.

  • Tutorial 5: Learn about how to use parallelization in order to generate and evaluate models more efficiently.

  • Tutorial 6: Learn about modelling protein complexes.

  • Tutorial 7: Learn about assembling custom pipelines.

  • Tutorial 8: Learn about automated template identification, alignment generation and template processing.

References

[1] Šali, A., & Blundell, T. L. (1993). Comparative protein modelling by satisfaction of spatial restraints. Journal of Molecular Biology, 234(3), 779–815. https://doi.org/10.1006/jmbi.1993.1626

[2] Webb, B., & Sali, A. (2016). Comparative Protein Structure Modeling Using MODELLER. Current Protocols in Bioinformatics, 54(1), 5.6.1-5.6.37. https://doi.org/10.1002/cpbi.3

Session Info

[8]:
# session info
import session_info
session_info.show(html = False, dependencies = True)
-----
homelette           1.3
modeller            10.1
pandas              0.25.3
session_info        1.0.0
-----
PIL                         7.0.0
altmod                      NA
anyio                       NA
attr                        19.3.0
babel                       2.9.1
backcall                    0.2.0
certifi                     2021.10.08
chardet                     3.0.4
charset_normalizer          2.0.8
cycler                      0.10.0
cython_runtime              NA
dateutil                    2.7.3
debugpy                     1.5.1
decorator                   4.4.2
entrypoints                 0.3
idna                        3.3
importlib_resources         NA
ipykernel                   6.5.1
ipython_genutils            0.2.0
jedi                        0.18.1
jinja2                      3.0.3
json5                       NA
jsonschema                  4.2.1
jupyter_server              1.12.1
jupyterlab_server           2.8.2
kiwisolver                  1.0.1
markupsafe                  2.0.1
matplotlib                  3.1.2
mpl_toolkits                NA
nbclassic                   NA
nbformat                    5.1.3
numpy                       1.17.4
ost                         2.2.0
packaging                   20.3
parso                       0.8.2
pexpect                     4.8.0
pickleshare                 0.7.5
pkg_resources               NA
prometheus_client           NA
promod3                     3.2.0
prompt_toolkit              3.0.23
ptyprocess                  0.7.0
pvectorc                    NA
pydev_ipython               NA
pydevconsole                NA
pydevd                      2.6.0
pydevd_concurrency_analyser NA
pydevd_file_utils           NA
pydevd_plugins              NA
pydevd_tracing              NA
pygments                    2.10.0
pyparsing                   2.4.6
pyrsistent                  NA
pytz                        2019.3
qmean                       NA
requests                    2.26.0
send2trash                  NA
sitecustomize               NA
six                         1.14.0
sniffio                     1.2.0
storemagic                  NA
swig_runtime_data4          NA
terminado                   0.12.1
tornado                     6.1
traitlets                   5.1.1
urllib3                     1.26.7
wcwidth                     NA
websocket                   1.2.1
zipp                        NA
zmq                         22.3.0
-----
IPython             7.30.0
jupyter_client      7.1.0
jupyter_core        4.9.1
jupyterlab          3.2.4
notebook            6.4.6
-----
Python 3.8.10 (default, Jun  2 2021, 10:49:15) [GCC 9.4.0]
Linux-4.15.0-162-generic-x86_64-with-glibc2.29
-----
Session information updated at 2021-11-29 18:56