Out-of-core calculations with md.iterload()ΒΆ

In [1]:
from __future__ import print_function
import numpy as np
import mdtraj as md
np.set_printoptions(threshold=50)

Sometimes your molecular dynamics trajectory files are too large to fit into memory. This can make analysis a burden, because you have to be very aware of the size of various objects. This can be a challenge in python because of the language's automatic memory management.

Fortunately, python provides the iterator protocol that can help us out here. We can "stream through" a trajectory, without loading the entire thing into memory at all. Instead, we'll process it in chunks.

For the purpose of this example, we'll use a short trajectory that's included with MDTraj for testing purposes. When you use this recipe yourself, you probably will want to point your code to your own trajectory file

In [2]:
import mdtraj.testing
traj_filename = mdtraj.testing.get_fn('frame0.h5')

First, if you only want a single frame of a trajectory, there's no reason to load up the whole thing. md.load_frame can load up a single frame for you. Let's get the first one.

In [3]:
first_frame = md.load_frame(traj_filename, 0)
first_frame
Out[3]:
<mdtraj.Trajectory with 1 frames, 22 atoms, 3 residues, and unitcells at 0x2ae44b5ca210>

Using md.iterload, you can iterate through chunks of the trajectory. If you don't retain a reference to the chunk as you iterate through, then the python garbage collector can recycle the memory.

In [4]:
rmsds = []
for chunk in md.iterload(traj_filename, chunk=100):
    rmsds.append(md.rmsd(chunk, first_frame))
    print(chunk, '\n', chunk.time)
<mdtraj.Trajectory with 100 frames, 22 atoms, 3 residues, and unitcells> 
 [ 500.00003052  501.00003052  502.00003052 ...,  597.          598.          599.        ]
<mdtraj.Trajectory with 100 frames, 22 atoms, 3 residues, and unitcells> 
 [ 600.          601.          602.         ...,  697.00006104  698.00006104
  699.00006104]
<mdtraj.Trajectory with 100 frames, 22 atoms, 3 residues, and unitcells> 
 [ 700.00006104  701.00006104  702.00006104 ...,  797.00006104  798.00006104
  799.00006104]
<mdtraj.Trajectory with 100 frames, 22 atoms, 3 residues, and unitcells> 
 [ 800.00006104  801.00006104  802.00006104 ...,  897.00006104  898.00006104
  899.00006104]
<mdtraj.Trajectory with 100 frames, 22 atoms, 3 residues, and unitcells> 
 [ 900.00006104  901.00006104  902.00006104 ...,  997.00006104  998.00006104
  999.00006104]
<mdtraj.Trajectory with 1 frames, 22 atoms, 3 residues, and unitcells> 
 [ 1000.00006104]

Now, we've calculated all of the rmsds chunk by chunk, and we can take a look at them.

In [5]:
rmsds = np.concatenate(rmsds)

print(rmsds)
print('max rmsd ', np.max(rmsds), 'at index', np.argmax(rmsds))
[ 0.          0.05957904  0.12331477 ...,  0.12567955  0.08629464
  0.14820947]
max rmsd  0.189723 at index 44

(iterload.ipynb; iterload_evaluated.ipynb; iterload.py)

Versions