Clustering with md.rmsd() and scipy.cluster.hierarchy()ΒΆ

clustering_evaluated

In this example, we cluster our alanine dipeptide trajectory using the RMSD distance metric and Ward's method.

In [1]:
%pylab inline
import mdtraj as md
import numpy as np
import scipy.cluster.hierarchy
Populating the interactive namespace from numpy and matplotlib

Let's load up our trajectory. This is the trajectory that we generated in the "Running a simulation in OpenMM and analyzing the results with mdtraj" example. The first step is to build the rmsd cache, which precalculates some values for the RMSD computation.

In [2]:
traj = md.load('ala2.h5')
In [3]:
# Lets compute all pairwise rmsds between conformations.

distances = np.empty((traj.n_frames, traj.n_frames))
for i in range(traj.n_frames):
    distances[i] = md.rmsd(traj, traj, i)
print 'Max pairwise rmsd: %f nm' % np.max(distances)
Max pairwise rmsd: 0.188493 nm
In [4]:
# scipy.cluster implements the ward linkage
# algorithm (among others)
linkage = scipy.cluster.hierarchy.ward(distances)
In [5]:
# Lets plot the resulting dendrogram.

figure()
title('RMSD Ward hierarchical clustering')
graph = scipy.cluster.hierarchy.dendrogram(linkage, no_labels=True, count_sort='descendent')

(clustering.ipynb; clustering_evaluated.ipynb; clustering.py)

Versions