Clustering with md.rmsd() and scipy.cluster.hierarchy()ΒΆ

In this example, we cluster our alanine dipeptide trajectory using the RMSD distance metric and Ward's method.

In [1]:
from __future__ import print_function
%matplotlib inline
import mdtraj as md
import numpy as np
import matplotlib.pyplot as plt
import scipy.cluster.hierarchyfrom scipy.spatial.distance import squareform
  File "<ipython-input-1-201dc33900ce>", line 6
    import scipy.cluster.hierarchyfrom scipy.spatial.distance import squareform
                                           ^
SyntaxError: invalid syntax

Let's load up our trajectory. This is the trajectory that we generated in the "Running a simulation in OpenMM and analyzing the results with mdtraj" example. The first step is to build the rmsd cache, which precalculates some values for the RMSD computation.

In [2]:
traj = md.load('ala2.h5')
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-2-a7995a96092c> in <module>()
----> 1 traj = md.load('ala2.h5')

NameError: name 'md' is not defined

Lets compute all pairwise rmsds between conformations.

In [3]:
distances = np.empty((traj.n_frames, traj.n_frames))
for i in range(traj.n_frames):
    distances[i] = md.rmsd(traj, traj, i)
print('Max pairwise rmsd: %f nm' % np.max(distances))
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-3-a8bc36041e0b> in <module>()
----> 1 distances = np.empty((traj.n_frames, traj.n_frames))
      2 for i in range(traj.n_frames):
      3     distances[i] = md.rmsd(traj, traj, i)
      4 print('Max pairwise rmsd: %f nm' % np.max(distances))

NameError: name 'traj' is not defined

scipy.cluster implements the ward linkage algorithm (among others)

In [4]:
linkage = scipy.cluster.hierarchy.ward(squareform(distances))
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-4-533fee164b1c> in <module>()
----> 1 linkage = scipy.cluster.hierarchy.ward(squareform(distances))

NameError: name 'scipy' is not defined

Lets plot the resulting dendrogram.

In [5]:
plt.title('RMSD Ward hierarchical clustering')
scipy.cluster.hierarchy.dendrogram(linkage, no_labels=True, count_sort='descendent')
None
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-5-e0df77656687> in <module>()
      1 plt.title('RMSD Ward hierarchical clustering')
----> 2 scipy.cluster.hierarchy.dendrogram(linkage, no_labels=True, count_sort='descendent')
      3 None

NameError: name 'scipy' is not defined
In [6]:
 

(clustering.ipynb; clustering_evaluated.ipynb; clustering.py)

Versions