Clustering with the md.rmsd() and scipy.cluster.hierarchy.ΒΆ

In this example, we cluster our alanine dipeptide trajectory using the RMSD distance and Ward’s algorithm.

In [1]: import mdtraj as md

In [2]: import numpy as np

In [3]: import scipy.cluster.hierarchy

Lets load up our trajectory. This is the trajectory that we generated in the “Running a simulation in OpenMM and analyzing the results with mdtraj” example. The first step is to build the rmsd cache, which precalculates some values for the rmsd computation.

In [4]: traj = md.load('ala2.h5')

Lets compute all pairwise rmsds between conformations.

In [5]: distances = np.empty((traj.n_frames, traj.n_frames))

In [6]: for i in range(traj.n_frames):
   ...:     distances[i] = md.rmsd(traj, traj, i)
   ...: 

In [7]: print 'Max pairwise rmsd: %f nm' % np.max(distances)
Max pairwise rmsd: 0.188493 nm

scipy.cluster implements the ward linkage algorithm (among others)

In [8]: linkage = scipy.cluster.hierarchy.ward(distances)

Lets plot the resulting dendrogram.

In [9]: import matplotlib.pyplot as pp

In [10]: pp.figure()
Out[10]: <matplotlib.figure.Figure at 0x10b34a8d0>

In [11]: pp.title('RMSD Ward hierarchical clustering')
Out[11]: <matplotlib.text.Text at 0x10b1e01d0>

In [12]: graph = scipy.cluster.hierarchy.dendrogram(linkage, no_labels=True, count_sort='descendent')
../../_images/dendrogram.png
Versions