
Methods for Macromolecular Modeling: Overview
The popularity of molecular dynamics simulations
Though a variety of simulation techniques
are available,
the molecular dynamics (MD)
approach tops the list in popularity because of its physical appeal and
biological connection. Namely, the
trajectories showing the evolution in time and space
of molecular conformations follow
classical physics; this allows kinetic processes
to be followed in detail, linking and expanding upon experimental observations.
MD's popularity would be overwhelming if the technique's computational demands,
and hence biological scope, were not so limited for large systems.
This limitation stems from the numerical stability requirement, which
restricts the timestep size used for integrating the equations of motion
to a relatively small value (e.g., ~1 fs).
This timestep size implies one million or more steps for simulating
a mere nanosecond in the life of a biomolecule;
this number already translates to days
or months of computing time,
even on state-of-the-art platforms [Board et al., 1994; Schlick et al., 1999],
since each step typically requires at least one expensive force evaluation.
Certainly, code parallelization on multiple-processor machines shaves off
computing clock time, as demonstrated by the longest simulation to date -- 1
One of the golden opportunities for work reduction is in the time
integration protocol.
Since the overall cost of MD simulations is dominated by the number
of force evaluations, reducing this number -- typically by increasing the effective
timestep size -- can lengthen the physical
timespan that can be simulated.
Yet, this "timestep problem" [Schlick et al., 1997; Schlick, 1998] has remained a `thorny nut'.
Much of this difficulty can be simply attributed to the dilemma
of balancing a larger stepsize with lower resolution accuracy. That is,
with decreased intervals of molecular observations (force update
frequency), the resolution of some fast processes might have to be
sacrificed.
Another obstacle is practical: the more sophisticated algorithms,
such as those described below, are not trivial
to incorporate in the context of large programs,
since evaluations of the total force depend on many program units.
This makes the application of new integrators for MD more complex than fast
schemes for electrostatics summations.
In the evolution of certain physical systems -- motion of planets
or flow of compound concentrations in chemical reactions -- the effect of the fast
processes on the global motion is negligible. In biomolecular
systems -- unfortunately for algorithm developers --
vibrational modes are intricately coupled:
thus, fast small-amplitude motions can trigger a cascade of events that
culminate in large-scale global rearrangements. This intricate coupling
limits the traditional mathematical machinery available for MD integration
and compels algorithm developers to seek inventive, tailored approaches.
Still, against these
challenges, mathematicians and other computational scientists
have labored in the past decade to understand the numerical
limitations of MD integration, devise clever schemes, and
design approaches departing from accurate motion-following that instead yield greater
overall information on the large thermally-accessible configuration
space of macromolecules [Elber, 1996; Schlick, 1998; Doniach & Eastman, 1999; Daggett, 2000].
Recent algorithmic work: general
Our group is developing new mathematical
tools to study biomolecular structures to push the modeling limits
of current computational methods. Examples include
developing new modeling approaches
for long DNAs with bound proteins
(e.g., [Beard & Schlick, 2001b; Beard & Schlick, 2001a; Huang et al., 2001; Schlick et al., 2000]),
efficient techniques
for nonlinear optimization by the truncated Newton method (package TNPACK)
[Schlick & Overton, 1987; Schlick & Fogelson, 1992a; Schlick & Fogelson, 1992b; Derreumaux et al., 1994; Xie & Schlick, 1999a; Xie & Schlick, 1999b],
and distance geometry and image processing techniques based on the singular
value decomposition for
combinatorial library design [Xie et al.,
2000; Xie & Schlick, 2000; Xie et al., 2001].
Recent algorithmic work: long timesteps and the LN algorithm
Much effort has focused on developing long-timestep methods
[Schlick et al., 1997; Schlick, 1998; Barth & Schlick, 1998a; Sandu & Schlick, 1999; Batcho et al., 2001].
Our extrapolative
multiple-timestep scheme called LN [Barth & Schlick, 1998a]
(for its origin in a Langevin/normal modes scheme [Zhang & Schlick, 1993; Zhang & Schlick, 1994])
combines two simple ingredients that work together
to alleviate resonances and allow larger timesteps in the LN scheme [Barth et al., 1997]:
extrapolative force splitting (which by itself leads to energy
drifts but is not as vulnerable to resonance artifacts as are impulse variants)
and stochastic dynamics (which eliminate the drifts as well
as dampen the mild resonances).
The stochastic framework, in the form of Langevin dynamics [Pastor, 1994],
represents a departure from Hamiltonian dynamics, but stochastic
simulations are appropriate for many thermodynamic and sampling
questions, since the same equilibrium states are approached in theory.
In practice, the bath coupling parameter (damping constant
Performance of MTS schemes can be assessed by comparing
energetic, geometric, and dynamic properties
to single-timestep (STS) simulations as well as experimental data where available.
In the case of LN trajectories, the reference integrator is
an extension of Verlet for Langevin
dynamics [Brunger et al., 1982].
The numerical stability and resonance alleviation were shown in
trajectories for the proteins
bovine pancreatic trypsin inhibitor (BPTI) and lysozyme, a large water system
[Barth & Schlick, 1998a; Sandu & Schlick, 1999], solvated DNA [Strahs & Schlick, 2000], and a large DNA/protein
system [Schlick & Yang, 2001; Yang et al., 2001], where
STS results were well-reproduced
for outer timesteps of 50 fs or more.
Results have shown that a good parameter choice for a 3-class LN scheme
is
The figure shown here illustrates LN's performance
for two solvated biomolecular systems containing
a polymerase
The added efficiency introduced by LN has allowed us to sample better
the minor-groove bending preferences in A-tract DNA [Strahs & Schlick, 2000],
study a large number of TATA variants (13) to assess sequence-dependent
deformability and flexibility patterns relative to the wildtype TATA element
[Qian et al., 2001], and sample many segments in the pathway of an enzyme opening
motion -- polymerase
LN Integration with Popular Software
The LN algorithm has been incorporated into the
widely used CHARMM program [Barth & Schlick, 1998b] in the context
of periodic models with cutoffs in the microcanonical ensemble.
Extensions to particle-mesh Ewald (PME) formulations
and other thermodynamic ensembles are underway.
We have recently completed the application
of a new multiple-timestep integrator combined to PME summation
in the AMBER software package [Batcho et al., 2001].
Speedup factors of 5 compared to traditional Verlet
schemes have been demonstrated.
Details of the theoretical and computational simulations can
be found in [Batcho & Schlick, 2001; Batcho & Schlick, 2001a], where
we discuss
performance speedup and the force balancing involved to maximize accuracy,
maintain long-time stability, and accelerate computational times.
Compared to prior MTS efforts in the context of the AMBER program,
advances are reported by optimizing PME parameters
for MTS applications and by using the
Position Verlet, rather than Velocity Verlet scheme, for the inner loop
[Batcho & Schlick, 2001a].
Moreover, ideas from the Langevin/MTS algorithm LN developed recently
are applied to Newtonian formulations here.
Our MTS/PME AMBER work suggested that modified Ewald
formulations, using tailored alternatives to the Gaussian screening
functions for the Coulombic terms, may allow larger timesteps and thus
further speedups
for both Newtonian and Langevin protocols. Such developments will be reported
separately [Batcho & Schlick, 2001b] (see "Pairwise Interaction Splitting").
The apparent difficulty in optimizing MTS/PME combinations is that the reciprocal
term -- which should isolate long-range slow forces --
contains significant force contributions from
near-field particle interactions [Heyes, 1981]. This requires
further mathematical transformations before adaptation to biomolecular MD.
Recall
that the summation for the electrostatic energy of a periodic
system can be expressed by a lattice sum over all pair interactions and over
all lattice vectors (excluding i = j in the primary box). Such a sum
is only conditionally convergent.
Ewald's trick was to convert this sum
into an expression involving a sum of an absolutely and rapidly convergent series
in direct and reciprocal space. This transformation is
accomplished by representing each point charge as a Gaussian charge density,
producing an exponentially decaying function.
This Gaussian transformation is counteracted by an analogous subtraction
to leave the net result of an effective point charge.
Thus, the electrostatic energy for a periodic system is expressed
as a real-space (direct) term (energy due to point charges
screened by oppositely-charged Gaussians) that
is short-range with a singularity at the origin,
and an associated canceling term (periodic sum of Gaussians)
that is smooth and long-range,
summed in reciprocal space using smooth interpolation of Fourier-series values
[Darden et al., 1999].
In practice, the width of the Gaussian distribution is chosen
to balance the work in the two terms: making the direct sum
rapidly decreasing and well-approximated for distances within a specified
cutoff distance, and the reciprocal term converging rapidly.
However, the rapid decay of the direct Ewald term implies the presence
of fast components in the reciprocal term. This in turn means that
the outer timestep associated with an MTS/PME code
cannot be too large, as found in practice [Figueirido et al., 1997; Cheng & Merz, 1999; Kawata & Mikami, 2000; Batcho et al., 2001].
Subsequently, truly efficient MTS/PME
protocols likely need to split the direct (and hence modify the reciprocal) term
so that the reciprocal term is truly long-range.
Such ideas require alternative mathematical formulations,
or a combination of distance and energy-component criteria
for defining the force classes. In addition, smoothing functions and possibly
separate treatments for the van der Waals and Coulomb terms may be needed.
Some such ideas based on non-Gaussian core functions optimized to isolate the near
and far-field contributions in the direct and reciprocal terms, respectively,
are presented in [Batcho & Schlick, 2001b] (see "Pairwise Interaction Splitting"), and another approach using an alternative
splitting strategy for standard Gaussian-based PME formulations
is currently being investigated [X. Qian and T. Schlick, work in progress].
These challenging extensions, as well as the efficient implementation
of MTS schemes to constant temperature and pressure
simulations [Martyna et al., 1996], will undoubtedly be forthcoming in the near future
and will help macromolecular models simulate a variety of
physical environments, as appropriate, for less and less computer time.
The ultimate bridging of the gap between experimental and simulation time frames
will depend on the relative pace of improvements in each domain
To be sure, important challenges remain in each arena, and the sum of
instrumentation and modeling will continue to be greater than each of its parts.
Macromolecular simulations are increasingly demonstrating their value as a
tool with which to study biomolecular structure and function.
The simulation field has made giant leaps through advances in computational techniques,
most notably for timestep integration for molecular dynamics,
configurational sampling, fast evaluation of
the long-range electrostatic interactions, and quantum-mechanical calculations.
Yet many fundamental and practical limitations face biomolecular modelers. They include
the approximate nature of the governing force fields as well as simulation
protocols, the limited range of configurational sampling and relatively short
trajectory times, the neglect of quantum effects in classical molecular dynamics,
and the enormous computational requirements
(e.g., a year of computational time on a lab workstation to simulate
a small solvated protein with simplified electrostatics).
New algorithmic approaches, hierarchical spatial representations,
and improved computing platforms are continuously needed to enhance the reliability
of macromolecular simulations and increase their appeal
and scope to experimentalists [Schlick, 1999].
The popularity of molecular dynamics simulations
Long-timestep algorithms
Recent algorithmic work: general
Recent algorithmic work: long timesteps and the LN algorithm
LN Applications
LN Integration with Popular Software
Ongoing Challenges
s,
for a villin headpiece, achieved in four months of dedicated CPU
time on 256 processors of a Cray T3D/E [Duan & Kollman, 1998],
or by IBM's ambitious announcement to fold proteins by 2005 on a unique petaflop
computer called Blue Gene with a massively parallel architecture
(one million or more processors).
Any algorithmic speedup will only help to further reduce MD computational time.
Back to Top
Back to Top
Back to Top
) can be
set as small as possible to minimize the Hamiltonian perturbation while
still ensuring numerical stability
[Barth & Schlick, 1998a; Sandu & Schlick, 1999; Schlick & Yang, 2001]. This stochastic coupling was also
adopted later by Skeel and coworkers, in the context of a mollified impulse method
[Izaguirre, 1999], but the impulse formulation restricts the outer timestep.
![]()
= 0.5 fs,
tm = 1 fs, and
t up to 150 fs.
If constrained dynamics (SHAKE [Rychaert et al., 1977]) for the light atom bonds are used,
the inner timestep can be increased to 1 fs and the medium timestep
to around 2 fs. The speedup factors depend on the reference system
but can be an order of magnitude (factor of 10 or more)
with respect to 0.5 fs inner timesteps
[Barth & Schlick, 1998a] or around 5 for 1-fs inner timesteps
(when SHAKE is used). Relative mean energy differences between MTS and STS
simulations are lower when SHAKE is applied [Schlick & Yang, 2001].
/DNA primer/template complex (43751 atoms) [Schlick & Yang, 2001; Yang et al., 2001],
and TATA-box DNA/TBP complex (41011 atoms) [D. Strahs, X. Qian, and
T. Schlick, work in progress].
The `Manhattan plots',
reporting the differences in mean energy components
as a function of the outer timestep
t relative to
STS Langevin simulations, show
low relative errors in all energy components (below 3%)
for outer timesteps up to
t = 120 fs.
In both cases, the inner and medium timesteps are fixed at 1 and 2 fs, respectively,
and the computational speedup factor for
t = 120 fs
is five or more with respect to STS simulations at 1 fs.
Back to Top
complexed to primer/template DNA --
with the open and closed crystallographic forms serving as experimental anchors
[Yang et al., 2001]. For the polymerase application, the accelerated sampling
led to the identification of a key step in the pathway (Arg258 rotation, together
with release of a catalytic magnesium ion), which is slow and maybe
rate-limiting. This produced the intriguing hypothesis that the Arg258 rotation, rather
than large subdomain motion per se, is a crucial aspect of pol
's DNA
synthesis fidelity.
Back to Top
Back to Top
Back to Top
