Tutorial 6: Complex Modelling

[1]:
import homelette as hm

Introduction

Welcome to the 6th tutorial on homelette about homology modelling of complex structures.

There are multiple issues about modelling protein complexes that make it a separate topic from the homology modelling of single structures:

  • Usually, a complex structure is required as a template.

  • Not all modelling programs can perform complex modelling.

  • Not all evaluation metrics developed for homology modelling are applicable to complex structures.

  • You need multiple alignments.

homelette is able to use modeller based modelling routines for complex modelling [1,2], and has some specific classes in place that make complex modelling easier to the user: - A function to assemble appropriate complex alignments - Special modelling classes for complex modelling - Special evaluation metrics for complex modelling

For this tutorial, we will build models for ARAF in complex with HRAS. As a template, we will use the structures [4G0N] (https://www.rcsb.org/structure/4G0N)(RAF1 in complex with HRAS) and 3NY5 (BRAF).

Alignment

Since all current modelling routines for protein complexes are modeller based, an alignment according to the modeller specification has to be constructed. homelette has the helper function assemble_complex_aln in the homelette.alignment submodule that is able to do that:

[2]:
?hm.alignment.assemble_complex_aln
Signature:
hm.alignment.assemble_complex_aln(
    *args: Type[ForwardRef('Alignment')],
    names: dict,
) -> Type[ForwardRef('Alignment')]
Docstring:
Assemble complex alignments compatible with MODELLER from individual
alignments.

Parameters
----------
*args : Alignment
    The input alignments
names : dict
    Dictionary instructing how sequences in the different alignment objects
    are supposed to be arranged in the complex alignment. The keys are the
    names of the sequences in the output alignments. The values are
    iterables of the sequence names from the input alignments in the order
    they are supposed to appaer in the output alignment. Any value that can
    not be found in the alignment signals that this position in the complex
    alignment should be filled with gaps.

Returns
-------
Alignment
    Assembled complex alignment

Examples
--------
>>> aln1 = hm.Alignment(None)
>>> aln1.sequences = {
...     'seq1_1': hm.alignment.Sequence('seq1_1', 'HELLO'),
...     'seq2_1': hm.alignment.Sequence('seq2_1', 'H---I'),
...     'seq3_1': hm.alignment.Sequence('seq3_1', '-HI--')
...     }
>>> aln2 = hm.Alignment(None)
>>> aln2.sequences = {
...     'seq2_2': hm.alignment.Sequence('seq2_2', 'KITTY'),
...     'seq1_2': hm.alignment.Sequence('seq1_2', 'WORLD')
...     }
>>> names = {'seq1': ('seq1_1', 'seq1_2'),
...          'seq2': ('seq2_1', 'seq2_2'),
...          'seq3': ('seq3_1', 'gaps')
...     }
>>> aln_assembled = hm.alignment.assemble_complex_aln(
...     aln1, aln2, names=names)
>>> aln_assembled.print_clustal()
seq1        HELLO/WORLD
seq2        H---I/KITTY
seq3        -HI--/-----
File:      /usr/local/src/homelette/homelette/alignment.py
Type:      function

In our case, we assemble an alignment from two different alignments, aln_1 which contains ARAF, RAF1 (4G0N) and BRAF (3NY5) and aln_2 which contains an HRAS sequence and the HRAS sequence from 4G0N.

[3]:
# import single alignments
aln1_file = 'data/complex/aln_eff.fasta_aln'
aln2_file = 'data/complex/aln_ras.fasta_aln'

aln_1 = hm.Alignment(aln1_file)
aln_2 = hm.Alignment(aln2_file)

# build dictionary that indicates how sequences should be assembled
names = {
    'ARAF': ('ARAF', 'HRAS'),
    '4G0N': ('4G0N', '4G0N'),
    '3NY5': ('3NY5', ''),
}

# assemble alignment
aln = hm.alignment.assemble_complex_aln(aln_1, aln_2, names=names)
aln.remove_redundant_gaps()
aln.print_clustal(line_wrap=70)
ARAF        ---GTVKVYLPNKQRTVVTVRDGMSVYDSLDKALKVRGLNQDCCVVYRLI---KGRKTVTAWDTAIAPLD
4G0N        -TSNTIRVFLPNKQRTVVNVRNGMSLHDCLMKALKVRGLQPECCAVFRLLHEHKGKKARLDWNTDAASLI
3NY5        HQKPIVRVFLPNKQRTVVPARCGVTVRDSLKKAL--RGLIPECCAVYRIQ------KKPIGWDTDISWLT


ARAF        GEELIVEVL------/MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLD
4G0N        GEELQVDFL------/MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLD
3NY5        GEELHVEVLENVPLT/------------------------------------------------------


ARAF        ILDTAGQEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDLAART
4G0N        ILDTAGQEE--AMRDQYMRTGEGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDLAART
3NY5        ----------------------------------------------------------------------


ARAF        VESRQAQDLARSYGIPYIETSAKTRQGVEDAFYTLVREIRQ-
4G0N        VESRQAQDLARSYGIPYIETSAKTRQGVEDAFYTLVREIRQH
3NY5        ------------------------------------------


After assembling the complex alignment, we annotate it as usual:

[4]:
# annotate alignment
aln.get_sequence('ARAF').annotate(seq_type='sequence')
aln.get_sequence('4G0N').annotate(seq_type = 'structure',
                              pdb_code = '4G0N',
                              begin_res = '1',
                              begin_chain = 'A')
aln.get_sequence('3NY5').annotate(seq_type = 'structure',
                              pdb_code = '3NY5',
                              begin_res = '1',
                              begin_chain = 'A')

Modelling

There are 4 routines available specifically for complex modelling based on modeller [1,2] and altmod [3]. They run with the same parameters as their counterparts for single structure modelling, except that they handle naming of new chains and residue numbers a bit differently.

The following routines are available for complex modelling:

  • Routine_complex_automodel_default

  • Routine_complex_automodel_slow

  • Routine_complex_altmod_default

  • Routine_complex_altmod_slow

[5]:
# initialize task object
t = hm.Task(task_name='Tutorial6',
            alignment=aln,
            target='ARAF',
            overwrite=True)

Modelling can be performed with Task.execute_routine as usual.

[6]:
# generate models based on a complex template
t.execute_routine(tag='automodel_' + '4G0N',
                  routine=hm.routines.Routine_complex_automodel_default,
                  templates = ['4G0N'],
                  template_location='data/complex/',
                  n_models=20,
                  n_threads=5)

Not all templates have to be complex templates, it is perfectly applicable to mix complex templates and single templates. However, at least one complex template should be used in order to convey information about the orientation of the proteins to each other.

[7]:
# generate models based on a complex and a single template
t.execute_routine(tag='automodel_' + '_'.join(['4G0N', '3NY5']),
                  routine=hm.routines.Routine_complex_automodel_default,
                  templates = ['4G0N', '3NY5'],
                  template_location='data/complex',
                  n_models=20,
                  n_threads=5)

Evaluation

Not all evaluation metrics are designed to evaluate complex structures. For example, the SOAP score has different statistical potentials for single proteins (Evaluation_soap_protein) and for protein complexes (Evaluation_soap_pp) [4].

[8]:
# perform evaluation
t.evaluate_models(hm.evaluation.Evaluation_mol_probity,
                  hm.evaluation.Evaluation_soap_pp,
                  n_threads=5)
[9]:
# show a bit of the evaluation
t.get_evaluation().sort_values(by='soap_pp_all').head()
[9]:
model tag routine mp_score soap_pp_all soap_pp_atom soap_pp_pair
32 automodel_4G0N_3NY5_13.pdb automodel_4G0N_3NY5 complex_automodel_default 2.26 -9502.636719 -7770.577637 -1732.059326
39 automodel_4G0N_3NY5_20.pdb automodel_4G0N_3NY5 complex_automodel_default 2.15 -9486.243164 -7656.946777 -1829.296143
28 automodel_4G0N_3NY5_9.pdb automodel_4G0N_3NY5 complex_automodel_default 2.46 -9475.368164 -7769.337891 -1706.030396
29 automodel_4G0N_3NY5_10.pdb automodel_4G0N_3NY5 complex_automodel_default 2.72 -9458.609375 -7647.797852 -1810.811646
9 automodel_4G0N_10.pdb automodel_4G0N complex_automodel_default 2.39 -9405.662109 -7718.845215 -1686.817139

Further reading

Congratulation on finishing the tutorial about complex modelling in homelette. The following tutorials might also be of interest to you:

  • Tutorial 1: Learn about the basics of homelette.

  • Tutorial 2: Learn more about already implemented routines for homology modelling.

  • Tutorial 3: Learn about the evaluation metrics available with homelette.

  • Tutorial 4: Learn about extending homelette’s functionality by defining your own modelling routines and evaluation metrics.

  • Tutorial 5: Learn about how to use parallelization in order to generate and evaluate models more efficiently.

  • Tutorial 7: Learn about assembling custom pipelines.

  • Tutorial 8: Learn about automated template identification, alignment generation and template processing.

References

[1] Šali, A., & Blundell, T. L. (1993). Comparative protein modelling by satisfaction of spatial restraints. Journal of Molecular Biology, 234(3), 779–815. https://doi.org/10.1006/jmbi.1993.1626

[2] Webb, B., & Sali, A. (2016). Comparative Protein Structure Modeling Using MODELLER. Current Protocols in Bioinformatics, 54(1), 5.6.1-5.6.37. https://doi.org/10.1002/cpbi.3

[3] Janson, G., Grottesi, A., Pietrosanto, M., Ausiello, G., Guarguaglini, G., & Paiardini, A. (2019). Revisiting the “satisfaction of spatial restraints” approach of MODELLER for protein homology modeling. PLoS Computational Biology, 15(12), e1007219. https://doi.org/10.1371/journal.pcbi.1007219

[4] Dong, G. Q., Fan, H., Schneidman-Duhovny, D., Webb, B., Sali, A., & Tramontano, A. (2013). Optimized atomic statistical potentials: Assessment of protein interfaces and loops. Bioinformatics, 29(24), 3158–3166. https://doi.org/10.1093/bioinformatics/btt560

Session Info

[10]:
# session info
import session_info
session_info.show(html = False, dependencies = True)
-----
homelette           1.3
pandas              0.25.3
session_info        1.0.0
-----
PIL                         7.0.0
altmod                      NA
anyio                       NA
attr                        19.3.0
babel                       2.9.1
backcall                    0.2.0
certifi                     2021.10.08
chardet                     3.0.4
charset_normalizer          2.0.8
cycler                      0.10.0
cython_runtime              NA
dateutil                    2.7.3
debugpy                     1.5.1
decorator                   4.4.2
entrypoints                 0.3
idna                        3.3
importlib_resources         NA
ipykernel                   6.5.1
ipython_genutils            0.2.0
jedi                        0.18.1
jinja2                      3.0.3
json5                       NA
jsonschema                  4.2.1
jupyter_server              1.12.1
jupyterlab_server           2.8.2
kiwisolver                  1.0.1
markupsafe                  2.0.1
matplotlib                  3.1.2
modeller                    10.1
mpl_toolkits                NA
nbclassic                   NA
nbformat                    5.1.3
numpy                       1.17.4
ost                         2.2.0
packaging                   20.3
parso                       0.8.2
pexpect                     4.8.0
pickleshare                 0.7.5
pkg_resources               NA
prometheus_client           NA
promod3                     3.2.0
prompt_toolkit              3.0.23
ptyprocess                  0.7.0
pvectorc                    NA
pydev_ipython               NA
pydevconsole                NA
pydevd                      2.6.0
pydevd_concurrency_analyser NA
pydevd_file_utils           NA
pydevd_plugins              NA
pydevd_tracing              NA
pygments                    2.10.0
pyparsing                   2.4.6
pyrsistent                  NA
pytz                        2019.3
qmean                       NA
requests                    2.26.0
send2trash                  NA
sitecustomize               NA
six                         1.14.0
sniffio                     1.2.0
storemagic                  NA
swig_runtime_data4          NA
terminado                   0.12.1
tornado                     6.1
traitlets                   5.1.1
urllib3                     1.26.7
wcwidth                     NA
websocket                   1.2.1
zipp                        NA
zmq                         22.3.0
-----
IPython             7.30.0
jupyter_client      7.1.0
jupyter_core        4.9.1
jupyterlab          3.2.4
notebook            6.4.6
-----
Python 3.8.10 (default, Jun  2 2021, 10:49:15) [GCC 9.4.0]
Linux-4.15.0-162-generic-x86_64-with-glibc2.29
-----
Session information updated at 2021-11-29 18:59