Solvent Accessible Surface Area (SASA) Calculation

In this example, we'll compute the solvent accessible surface area of one of the residues in our protien accross each frame in a MD trajectory. We're going to use our trustly alanine dipeptide trajectory for this calculation, but in a real-world situtation you'll probably want to use a more interesting peptide or protein, especially one with a Trp residue.

In [1]:
%matplotlib inline
from __future__ import print_function
import numpy as np
import mdtraj as md

We'll use the algorithm from Shrake and Rupley for computing the SASA. Here's the function in MDTraj:

In [2]:
help(md.shrake_rupley)
Help on function shrake_rupley in module mdtraj.geometry.sasa:

shrake_rupley(traj, probe_radius=0.14, n_sphere_points=960, mode='atom')
    Compute the solvent accessible surface area of each atom or residue in each simulation frame.
    
    Parameters
    ----------
    traj : Trajectory
        An mtraj trajectory.
    probe_radius : float, optional
        The radius of the probe, in nm.
    n_sphere_points : int, optional
        The number of points representing the surface of each atom, higher
        values leads to more accuracy.
    mode : {'atom', 'residue'}
        In mode == 'atom', the extracted areas are resolved per-atom
        In mode == 'residue', this is consolidated down to the
        per-residue SASA by summing over the atoms in each residue.
    
    Returns
    -------
    areas : np.array, shape=(n_frames, n_features)
        The accessible surface area of each atom or residue in every frame.
        If mode == 'atom', the second dimension will index the atoms in
        the trajectory, whereas if mode == 'residue', the second
        dimension will index the residues.
    
    Notes
    -----
    This code implements the Shrake and Rupley algorithm, with the Golden
    Section Spiral algorithm to generate the sphere points. The basic idea
    is to great a mesh of points representing the surface of each atom
    (at a distance of the van der waals radius plus the probe
    radius from the nuclei), and then count the number of such mesh points
    that are on the molecular surface -- i.e. not within the radius of another
    atom. Assuming that the points are evenly distributed, the number of points
    is directly proportional to the accessible surface area (its just 4*pi*r^2
    time the fraction of the points that are accessible).
    
    There are a number of different ways to generate the points on the sphere --
    possibly the best way would be to do a little "molecular dyanmics" : put the
    points on the sphere, and then run MD where all the points repel one another
    and wait for them to get to an energy minimum. But that sounds expensive.
    
    This code uses the golden section spiral algorithm
    (picture at http://xsisupport.com/2012/02/25/evenly-distributing-points-on-a-sphere-with-the-golden-sectionspiral/)
    where you make this spiral that traces out the unit sphere and then put points
    down equidistant along the spiral. It's cheap, but not perfect.
    
    The gromacs utility g_sas uses a slightly different algorithm for generating
    points on the sphere, which is based on an icosahedral tesselation.
    roughly, the icosahedral tesselation works something like this
    http://www.ziyan.info/2008/11/sphere-tessellation-using-icosahedron.html
    
    References
    ----------
    .. [1] Shrake, A; Rupley, JA. (1973) J Mol Biol 79 (2): 351--71.

In [3]:
trajectory = md.load('ala2.h5')
sasa = md.shrake_rupley(trajectory)

print(trajectory)
print('sasa data shape', sasa.shape)
<mdtraj.Trajectory with 100 frames, 22 atoms, 3 residues, without unitcells>
sasa data shape (100, 22)

The computed sasa array contains the solvent accessible surface area for each atom in each frame of the trajectory. Let's sum over all of the atoms to get the total SASA from all of the atoms in each frame.

In [4]:
total_sasa = sasa.sum(axis=1)
print(total_sasa.shape)
(100,)
In [5]:
from matplotlib.pylab import *

plot(trajectory.time, total_sasa)
xlabel('Time [ps]', size=16)
ylabel('Total SASA (nm)^2', size=16)
show()

We probably don't really have enough data do compute a meaningful autocorrelation, but for more realistic datasets, this might be something that you want to do.

In [6]:
def autocorr(x):
    "Compute an autocorrelation with numpy"
    x = x - np.mean(x)
    result = np.correlate(x, x, mode='full')
    result = result[result.size//2:]
    return result / result[0]

semilogx(trajectory.time, autocorr(total_sasa))
xlabel('Time [ps]', size=16)
ylabel('SASA autocorrelation', size=16)
show()

(solvent-accessible-surface-area.ipynb; solvent-accessible-surface-area_evaluated.ipynb; solvent-accessible-surface-area.py)

Versions