In [1]: import numpy as np
In [2]: np.set_printoptions(threshold=50)
In [3]: import mdtraj as md
Sometimes your molecular dynamics trajectory files are too large to fit into memory. This can make analysis a burden, because you have to be very aware of the size of various objects. This can be a challenge in python because of the language’s automatic memory management.
Fortunately, python provides the iterator protocol that can help us out here. We can “stream through” a trajectory, without loading the entire thing into memory at all. Instead, we’ll process it in chunks.
For the purpose of this example, we’ll use a short trajectory that’s included with MDTraj for testing purposes. When you use this recipe yourself, you probably will want to point your code to your own trajectory file
In [4]: import mdtraj.testing
In [5]: traj_filename = mdtraj.testing.get_fn('frame0.h5')
First, if you only want a single frame of a trajectory, there’s no reason to load up the whole thing. md.load_frame() can load up a single frame for you. Let’s get the first one.
In [6]: first_frame = md.load_frame(traj_filename, 0)
In [7]: print first_frame
<mdtraj.Trajectory with 1 frames, 22 atoms>
Using md.iterload(), you can iterate through chunks of the trajectory. If you don’t retain a reference to the chunk as you iterate through, then the python garbage collector can recycle the memory.
In [8]: rmsds = []
In [9]: for chunk in md.iterload(traj_filename, chunk=100):
...: rmsds.append(md.rmsd(chunk, first_frame))
...: print chunk
...: print chunk.time
...:
<mdtraj.Trajectory with 100 frames, 22 atoms>
[ 500.00003052 501.00003052 502.00003052 ..., 597. 598. 599. ]
<mdtraj.Trajectory with 100 frames, 22 atoms>
[ 600. 601. 602. ..., 697.00006104 698.00006104
699.00006104]
<mdtraj.Trajectory with 100 frames, 22 atoms>
[ 700.00006104 701.00006104 702.00006104 ..., 797.00006104 798.00006104
799.00006104]
<mdtraj.Trajectory with 100 frames, 22 atoms>
[ 800.00006104 801.00006104 802.00006104 ..., 897.00006104 898.00006104
899.00006104]
<mdtraj.Trajectory with 100 frames, 22 atoms>
[ 900.00006104 901.00006104 902.00006104 ..., 997.00006104 998.00006104
999.00006104]
<mdtraj.Trajectory with 1 frames, 22 atoms>
[ 1000.00006104]
Now, we’ve calculated all of the rmsds chunk by chunk, and we can take a look at them.
In [10]: rmsds = np.concatenate(rmsds)
In [11]: print rmsds
[ 0. 0.05957904 0.12331477 ..., 0.12567955 0.08629464
0.14820947]
In [12]: print np.max(rmsds), np.argmax(rmsds)
0.189723 44