Solvent Accessible Surface Area (SASA) Calculation¶

In this example, we'll compute the solvent accessible surface area of one of the residues in our protien accross each frame in a MD trajectory. We're going to use our trustly alanine dipeptide trajectory for this calculation, but in a real-world situtation you'll probably want to use a more interesting peptide or protein, especially one with a Trp residue.

In [1]:

from __future__ import print_function
%matplotlib inline
import numpy as np
import mdtraj as md

We'll use the algorithm from Shrake and Rupley for computing the SASA. Here's the function in MDTraj:

In [2]:

help(md.shrake_rupley)

Help on function shrake_rupley in module mdtraj.geometry.sasa:

shrake_rupley(traj, probe_radius=0.14, n_sphere_points=960, mode='atom', change_radii=None, get_mapping=False)
Compute the solvent accessible surface area of each atom or residue in each simulation frame.

Parameters
----------
traj : Trajectory
An mtraj trajectory.
probe_radius : float, optional
The radius of the probe, in nm.
n_sphere_points : int, optional
The number of points representing the surface of each atom, higher
values leads to more accuracy.
mode : {'atom', 'residue'}
In mode == 'atom', the extracted areas are resolved per-atom
In mode == 'residue', this is consolidated down to the
per-residue SASA by summing over the atoms in each residue.
change_radii : dict, optional
A partial or complete dict containing the radii to change from the
defaults. Should take the form {"Symbol" : radii_in_nm }, e.g.
{"Cl" : 0.175 } to change the radii of Chlorine to 175 pm.
get_mapping : bool, optional
Instead of returning only the areas, also return the indices of the
atoms or the residue-to-atom mapping. If True, will return a tuple
that contains the areas and the mapping (np.array, shape=(n_atoms)).

Returns
-------
areas : np.array, shape=(n_frames, n_features)
The accessible surface area of each atom or residue in every frame.
If mode == 'atom', the second dimension will index the atoms in
the trajectory, whereas if mode == 'residue', the second
dimension will index the residues.

Notes
-----
This code implements the Shrake and Rupley algorithm, with the Golden
Section Spiral algorithm to generate the sphere points. The basic idea
is to great a mesh of points representing the surface of each atom
(at a distance of the van der waals radius plus the probe
radius from the nuclei), and then count the number of such mesh points
that are on the molecular surface -- i.e. not within the radius of another
atom. Assuming that the points are evenly distributed, the number of points
is directly proportional to the accessible surface area (its just 4*pi*r^2
time the fraction of the points that are accessible).

There are a number of different ways to generate the points on the sphere --
possibly the best way would be to do a little "molecular dyanmics" : put the
points on the sphere, and then run MD where all the points repel one another
and wait for them to get to an energy minimum. But that sounds expensive.

This code uses the golden section spiral algorithm
(picture at http://xsisupport.com/2012/02/25/evenly-distributing-points-on-a-sphere-with-the-golden-sectionspiral/)
where you make this spiral that traces out the unit sphere and then put points
down equidistant along the spiral. It's cheap, but not perfect.

The gromacs utility g_sas uses a slightly different algorithm for generating
points on the sphere, which is based on an icosahedral tesselation.
roughly, the icosahedral tesselation works something like this
http://www.ziyan.info/2008/11/sphere-tessellation-using-icosahedron.html

References
----------
.. [1] Shrake, A; Rupley, JA. (1973) J Mol Biol 79 (2): 351--71.

In [3]:

trajectory = md.load('ala2.h5')
sasa = md.shrake_rupley(trajectory)

print(trajectory)
print('sasa data shape', sasa.shape)

<mdtraj.Trajectory with 100 frames, 22 atoms, 3 residues, without unitcells>
sasa data shape (100, 22)

The computed sasa array contains the solvent accessible surface area for each atom in each frame of the trajectory. Let's sum over all of the atoms to get the total SASA from all of the atoms in each frame.

In [4]:

total_sasa = sasa.sum(axis=1)
print(total_sasa.shape)

(100,)

In [5]:

from matplotlib.pylab import *

plot(trajectory.time, total_sasa)
xlabel('Time [ps]', size=16)
ylabel('Total SASA (nm)^2', size=16)
show()

We probably don't really have enough data do compute a meaningful autocorrelation, but for more realistic datasets, this might be something that you want to do.

In [6]:

def autocorr(x):
    "Compute an autocorrelation with numpy"
    x = x - np.mean(x)
    result = np.correlate(x, x, mode='full')
    result = result[result.size//2:]
    return result / result[0]

semilogx(trajectory.time, autocorr(total_sasa))
xlabel('Time [ps]', size=16)
ylabel('SASA autocorrelation', size=16)
show()

(solvent-accessible-surface-area.ipynb; solvent-accessible-surface-area_eval.ipynb; solvent-accessible-surface-area.py)