Out-of-core calculations with md.iterload()ΒΆ

In [1]:
from __future__ import print_function
import numpy as np
import mdtraj as md
np.set_printoptions(threshold=50)

Sometimes your molecular dynamics trajectory files are too large to fit into memory. This can make analysis a burden, because you have to be very aware of the size of various objects. This can be a challenge in python because of the language's automatic memory management.

Fortunately, python provides the iterator protocol that can help us out here. We can "stream through" a trajectory, without loading the entire thing into memory at all. Instead, we'll process it in chunks.

For the purpose of this example, we'll use a short trajectory that's included with MDTraj for testing purposes. When you use this recipe yourself, you probably will want to point your code to your own trajectory file

In [2]:
import mdtraj.testing
traj_filename = mdtraj.testing.get_fn('frame0.h5')
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-2-9fa2472f77d7> in <module>()
----> 1 import mdtraj.testing
      2 traj_filename = mdtraj.testing.get_fn('frame0.h5')

/home/travis/miniconda3/envs/py27/lib/python2.7/site-packages/mdtraj/testing/__init__.py in <module>()
     23 
     24 from __future__ import print_function, division
---> 25 from mdtraj.testing.testing import *
     26 from mdtraj.testing.docstrings import *

/home/travis/miniconda3/envs/py27/lib/python2.7/site-packages/mdtraj/testing/testing.py in <module>()
     37   assert_raises, assert_string_equal, assert_warns)
     38 from numpy.testing.decorators import skipif, slow
---> 39 from nose.tools import ok_, eq_, raises
     40 from nose import SkipTest
     41 from pkg_resources import resource_filename

ImportError: No module named nose.tools

First, if you only want a single frame of a trajectory, there's no reason to load up the whole thing. md.load_frame can load up a single frame for you. Let's get the first one.

In [3]:
first_frame = md.load_frame(traj_filename, 0)
first_frame
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-3-8b14c299f34f> in <module>()
----> 1 first_frame = md.load_frame(traj_filename, 0)
      2 first_frame

NameError: name 'traj_filename' is not defined

Using md.iterload, you can iterate through chunks of the trajectory. If you don't retain a reference to the chunk as you iterate through, then the python garbage collector can recycle the memory.

In [4]:
rmsds = []
for chunk in md.iterload(traj_filename, chunk=100):
    rmsds.append(md.rmsd(chunk, first_frame))
    print(chunk, '\n', chunk.time)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-4-0c96a6a60de2> in <module>()
      1 rmsds = []
----> 2 for chunk in md.iterload(traj_filename, chunk=100):
      3     rmsds.append(md.rmsd(chunk, first_frame))
      4     print(chunk, '\n', chunk.time)

NameError: name 'traj_filename' is not defined

Now, we've calculated all of the rmsds chunk by chunk, and we can take a look at them.

In [5]:
rmsds = np.concatenate(rmsds)

print(rmsds)
print('max rmsd ', np.max(rmsds), 'at index', np.argmax(rmsds))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-f8df5e67b1c3> in <module>()
----> 1 rmsds = np.concatenate(rmsds)
      2 
      3 print(rmsds)
      4 print('max rmsd ', np.max(rmsds), 'at index', np.argmax(rmsds))

ValueError: need at least one array to concatenate
In [6]:
 

(iterload.ipynb; iterload_evaluated.ipynb; iterload.py)

Versions