Basics

Start ipython –pylab, especially for the following features:

  • use the interactive help (command? or command??)

  • TAB completion, e.g. MDAnalysis.U<TAB> will autocomplete to MDAnalysis.Universe.

    MDAnalysis.Universe.<TAB> will show all methods and attributes.

  • quick plotting with matplotlib (and array manipulations with numpy)

Loading modules

Load MDAnalysis:

import MDAnalysis

MDAnalysis comes with a bunch of test files and trajectories. One is the AdK trajectory from Practical 10 that samples a transition from a closed to an open conformation [Beckstein2009]. The topology file (CHARMM psf format) and trajectory (CHARMM/NAMD dcd format) can be loaded into the variables PSF and DCD:

from MDAnalysis.tests.datafiles import PSF, DCD

Finally, also load numpy:

import numpy as np

Universe and AtomGroup

MDAnalysis is object oriented. Molecular systems consist of Atom objects (“instances” of the “class” MDAnalysis.core.AtomGroup.Atom), which are grouped in AtomGroup instances. You build the AtomGroup of your system by loading a topology (list of atoms and possibly their connectivity) together with a trajectory (coordinate information) into the central data structure, the Universe object:

>>> u = MDAnalysis.Universe(PSF, DCD)
>>> print(u)
<Universe with 3341 atoms>

The atoms are stored in the “attribute” MDAnalysis.core.AtomGroup.Universe.atoms

>>> print(u.atoms)
<AtomGroup with 3341 atoms>
>>> list(u.atoms[:5])
[< Atom 1: name 'N' of type '56' of resname 'MET', resid 1 and segid '4AKE'>,
 < Atom 2: name 'HT1' of type '2' of resname 'MET', resid 1 and segid '4AKE'>,
 < Atom 3: name 'HT2' of type '2' of resname 'MET', resid 1 and segid '4AKE'>,
 < Atom 4: name 'HT3' of type '2' of resname 'MET', resid 1 and segid '4AKE'>,
 < Atom 5: name 'CA' of type '22' of resname 'MET', resid 1 and segid '4AKE'>]

Any AtomGroup knows the residues that the atoms belong to via the attribute residues, which produces a ResidueGroup. A ResidueGroup acts like a list of Residue objects:

>>> u.atoms[100:130].residues
<ResidueGroup [<Residue 'LEU', 6>, <Residue 'GLY', 7>, <Residue 'ALA', 8>]>

Larger organizational units are Segment instances, for example one protein or all the solvent molecules or simply the whole system. Atom, AtomGroup, Residue, and ResidueGroup have an attribute segments that will list the segment IDs (“segids”) as a SegmentGroup:

>>> u.atoms.segments
<SegmentGroup [<Segment '4AKE'>]>

The converse is also true: each “higher” level in the hierarchy also know about the Residue and Atom instances it contains. For example, to list the atoms of the ResidueGroup we had before:

>>> r = u.atoms[100:130].residues
>>> r.atoms
<AtomGroup with 36 atoms>

Exercise 1

  1. What residue (“resname”) does the last atom belong to in the above example?

    >>> r = u.atoms[100:130].residues
    >>> r.atoms[-1]
    < Atom 136: name 'O' of type '70' of resname 'ALA', resid 8 and segid '4AKE'>
    
  2. Why does the expression

    len(u.atoms[100:130]) == len(u.atoms[100:130].residues.atoms)
    

    return False?

    Because the complete residues contain more atoms than the arbitrary slice of atoms.

  3. How many residues are in the Universe u?

    >>> len(u.atoms.residues)
    >>> u.atoms.numberOfResidues()
    214
    

    How do you get a list of the residue names (such as ["Ala", "Gly", "Gly", "Asp", ...]) and residue numbers (“resid”) for atoms 1000 to 1300? And as a list of tuples (resname, resid) (Hint: zip())?:

    >>> resnames = u.atoms[999:1300].resnames()
    >>> resids = u.atoms[999:1300].resids()
    >>> zip(resnames, resids)
    

    How do you obtain the resid and the resname for the 100th residue? (Hint: investigate the Residue object interactively with TAB completion)

    >>> r100 = u.atoms.residues[99]
    >>> print(r100.id, r100.name)
    100 GLY
    
  4. How many segments are there?

    >>> len(u.segments)
    >>> len(u.atoms.segments)
    >>> u.atoms.numberOfSegments()
    1
    

    What is the segment identifier of the first Segment?

    >>> s1 = u.segments[0]
    >>> s1.id
    '4AKE'
    

Selections

MDAnalysis comes with a fairly complete atom selection facility. Primarily, one uses the method selectAtoms() of a Universe:

>>> CA = u.selectAtoms("protein and name CA")
>>> CA
>>> <AtomGroup with 214 atoms>

but really any AtomGroup has a selectAtoms() method:

>>> acidic = CA.selectAtoms("resname ASP or resname GLU")
>>> acidic
>>> <AtomGroup with 35 atoms>
>>> acidic.residues
<ResidueGroup [<Residue 'GLU', 22>, <Residue 'ASP', 33>, <Residue 'GLU', 44>, <Residue 'ASP', 51>, <Residue 'ASP', 54>, <Residue 'ASP', 61>, <Residue 'GLU', 62>, <Residue 'GLU', 70>, <Residue 'GLU', 75>, <Residue 'ASP', 76>, <Residue 'ASP', 84>, <Residue 'ASP', 94>, <Residue 'GLU', 98>, <Residue 'ASP', 104>, <Residue 'GLU', 108>, <Residue 'ASP', 110>, <Residue 'ASP', 113>, <Residue 'GLU', 114>, <Residue 'ASP', 118>, <Residue 'GLU', 143>, <Residue 'ASP', 146>, <Residue 'ASP', 147>, <Residue 'GLU', 151>, <Residue 'GLU', 152>, <Residue 'ASP', 158>, <Residue 'ASP', 159>, <Residue 'GLU', 161>, <Residue 'GLU', 162>, <Residue 'GLU', 170>, <Residue 'GLU', 185>, <Residue 'GLU', 187>, <Residue 'ASP', 197>, <Residue 'GLU', 204>, <Residue 'ASP', 208>, <Residue 'GLU', 210>]>

See also

All the selection keywords are described in the documentation.

Selections can be combined with boolean expression and it is also possible to select by geometric criteria, e.g. with the around distance selection keyword:

u.selectAtoms("((resname ASP or resname GLU) and not (backbone or name CB or name CG)) \
                 and around 4.0 ((resname LYS or resname ARG) \
                                  and not (backbone or name CB or name CG))").residues

What is this selection trying to accomplish?

Exercises 2

  1. Select the range of resids 100 to 200 (“100-200”) with a selection. Compare the result to what you get by slicing the u.atoms.residues appropriately.

    Which approach would you prefer to use in a analysis script?

    Solution:

    >>> u.selectAtoms("resid 100-200")
    <AtomGroup with 1609 atoms>
    

    Compare to the slicing solution (doing an element-wise comparison, i.e. residue by residue in each list()):

    >>> list(u.selectAtoms("resid 100-200").residues) == list(u.atoms.residues[99:200])
    

    If one wants to get specific residues in scripts one typically uses selections instead of slicing because the index in the slice might not correspond to the actual residue ids (minus 1): If a number of residues (e.g. 150-160) are missing from the structure then the selection will simply give you residues 100-149 and 151-200 but the slice 99:200 would give you residues 100-149, 151-209.

  2. Select all residues that do not contain a \(\mathrm{C}_\beta\) (“CB”) atom. How many are there? What residue names did you find?

    Solution:

    >>> sel = u.selectAtoms("(byres name CA) and not (byres name CB)").residues
    >>> len(sel)
    20
    

    These are all Glycines, as can be seen by comparing the residue groups element-wise:

    >>> glycines = u.selectAtoms("resname GLY")
    >>> list(sel) == list(glycines.residues)
    True
    

Table Of Contents

Previous topic

Installing MDAnalysis

Next topic

Working with AtomGroups

This Page