In this example, we cluster our alanine dipeptide trajectory using the RMSD distance and Ward’s algorithm.
In [1]: import mdtraj as md
In [2]: import numpy as np
In [3]: import scipy.cluster.hierarchy
Lets load up our trajectory. This is the trajectory that we generated in the “Running a simulation in OpenMM and analyzing the results with mdtraj” example. The first step is to build the rmsd cache, which precalculates some values for the rmsd computation.
In [4]: traj = md.load('ala2.h5')
Lets compute all pairwise rmsds between conformations.
In [5]: distances = np.empty((traj.n_frames, traj.n_frames))
In [6]: for i in range(traj.n_frames):
...: distances[i] = md.rmsd(traj, traj, i)
...:
In [7]: print 'Max pairwise rmsd: %f nm' % np.max(distances)
Max pairwise rmsd: 0.188493 nm
scipy.cluster implements the ward linkage algorithm (among others)
In [8]: linkage = scipy.cluster.hierarchy.ward(distances)
Lets plot the resulting dendrogram.
In [9]: import matplotlib.pyplot as pp
In [10]: pp.figure()
Out[10]: <matplotlib.figure.Figure at 0x10b34a8d0>
In [11]: pp.title('RMSD Ward hierarchical clustering')
Out[11]: <matplotlib.text.Text at 0x10b1e01d0>
In [12]: graph = scipy.cluster.hierarchy.dendrogram(linkage, no_labels=True, count_sort='descendent')