.. -*- coding: utf-8 -*-

.. _multiseq:

===========================================
Structural Bioinformatics with VMD MultiSeq
===========================================

Open :program:`VMD`. With the `MultiSeq plugin`_ it provides a
convenient interface to do structural bioinformatics.

Loading structures of AdK from different organisms
==================================================

We will manually select one PDB code from each organism. (Normally,
one would do a more careful selection, e.g., taking resolution into
account).

We want to analyze AdK structures with PDB codes **4jzk 1ak2 2c9y 1aky
2ak3 1zd8 2ar7 1p3j 3gmt 1ake 4pzl 1zin 4k46 1s3g 3be4 3fb4 3tlx**.


Manual download of pre-packaged structures
------------------------------------------

1. For this practical, download a zip file with all PDB files:
   :download:`adk_pdbs.zip <_downloads/adk_pdbs.zip>`.

2. Uncompress the file (try double clicking).     

3. Load each structure into VMD by using :menuselection:`File --> Load
   New Molecule`.

-----

You might be able to automate loading of the pdb files with Tcl code
(this is advanced usage):

Open :menuselection:`Extensions --> Tk Console` and type the following:


.. code-block:: tcl

   # "path/to/adk_pdbs" must be the file system path to the unzipped
   # directory
   # Ask your instructor for help.
   cd path/to/adk_pdbs

Now you can load all the files from the command line.

.. code-block:: tcl
   
   set pdbcodes {4jzk 1ak2 2c9y 1aky 2ak3 1zd8 2ar7 1p3j 3gmt 1ake  4pzl 1zin 4k46 1s3g  3be4 3fb4 3tlx}
   foreach pdb $pdbcodes {
       set pdbfile ${pdb}.pdb
       puts "Loading $pdbfile..."
       mol new $pdbfile
   }


Manual download from the Protein Databank
-----------------------------------------

Download the following structures with PDB codes **4jzk 1ak2 2c9y 1aky
2ak3 1zd8 2ar7 1p3j 3gmt 1ake  4pzl 1zin 4k46 1s3g  3be4 3fb4
3tlx**.

1. Open the `Download
<http://www.rcsb.org/pdb/home/home.do#Category-download>`_ dialog

2. Copy and paste the PDB codes into the box "Download: Coordinates &
   Experimental Data"::

      4jzk 1ak2 2c9y 1aky 2ak3 1zd8 2ar7 1p3j 3gmt
      1ake 4pzl 1zin 4k46 1s3g 3be4 3fb4 3tlx

3. Select the checkmark
   - for **PDB** only.
   - for **uncompressed**

4. *Launch Download*     (and save the files to a directory where you
   do the work for the practical)

5. Load each structure into VMD by using :menuselection:`File --> Load
   New Molecule`.
   
  
Automatic download from the Protein Databank
--------------------------------------------

.. warning:: As of November 2017 the following is not working with any
             version of VMD prior to `VMD 1.9.4 alpha
             <http://www.ks.uiuc.edu/Research/vmd/vmd-new/devel.html>`_
             because of reorganization of file locations in the
             Protein Databank. Use the "manual" download recipe above.

Open :menuselection:`Extensions --> Tk Console` and type

.. code-block:: tcl

   set pdbcodes {4jzk 1ak2 2c9y 1aky 2ak3 1zd8 2ar7 1p3j 3gmt 1ake  4pzl 1zin 4k46 1s3g  3be4 3fb4 3tlx}
   foreach pdb $pdbcodes {puts "Loading $pdb..."; mol new $pdb}

(4np6 excluded because it does not parse easily, and I did not have
time to check what was wrong.)

Using multiseq
--------------

In VMD, load :menuselection:`Extensions --> Analysis --> MultiSeq`. (This can
take a moment when it downloads updates.)

Manually delete all chains B, C, ... (highlight and :kbd:`delete`.)

.. Delete 4np6 (STAMP complains). Hide rep.

Perform a STAMP structural alignment: In Multiseq choose
:menuselection:`Tools --> Stamp Structural Alignment`.

.. For changing all representations to Tube:
.. 1. go to top molecule
.. 2. change active rep to Tube (and color)
.. 3. :menuselection:`Extensions --> Visualization --> Clone Representation`:
..    From Top to All


Structural conservation
=======================

:math:`Q` (``Qres``) is a measure of structural similarity.

:math:`Q` is a parameter that indicates structural identity. :math:`Q`
accounts for the fraction of similar native contacts between the
aligned residues in two proteins [Eastwood2001]_. :math:`Q=1` implies
that structures are identical. When :math:`Q` has a low score
(0.1-0.3), structures are not aligned well, i.e., only a small
fraction of the Cα atoms superimpose. :math:`Q` per residue is the
contribution from each residue to the overall :math:`Q` value of
aligned structures.

1. In Multiseq window, choose :menuselection:`View --> Coloring --> Qres`.
2. Observe the coloring in the sequence alignment and the graphics
   window (projected on structures)

Note that the CORE domain has high ``Qres``. This indicates that it
superimposes well in all structures.


Sequence conservation
=====================

Color by *Sequence Identity*.

Note the residues that are 100% conserved (:menuselection:`Search --> Select
Residues...`: Where Sequence Idenity >= 100):

- R, K
- G, P

Switch to *1ake* and create a new rep for ``chain A and  resname AP5``
(use *CPK* or *VDW* and color by *name*)

What is the role of the conserved R (Arg) and K (Lys)? (MultiSeq
:menuselection:`View --> Highlight Color --> ResType`).


Phylogenetic tree
=================

You need an alignment to create a tree. A phylogenetic tree displays
evolutionary relationships.

:menuselection:`Tools --> Phylogenetic Tree`.

- using Percent Identity
- label with full organism name

Note that this tree is based on the structural alignment and the
conformational change that is visible obscures some of the
evolutionary relationships.


References
==========

.. [Eastwood2001] Eastwood, M.P., C. Hardin, Z. Luthey-Schulten, and P.G. Wolynes. “Evaluating the protein structure-prediction schemes using energy landscape theory.” IBM J. Res. Dev. 45: 475-497, 2001. URL: http://www.research.ibm.com/journal/rd/453/eastwood.pdf


.. _`MultiSeq plugin`: http://www.ks.uiuc.edu/Research/vmd/plugins/multiseq/