Protocols
The protocol is the object responsible for carrying out the
simulations and simulation extensions (if they are desired) for a
given system, grid point and initial configuration. gmak provides
two GROMACS-compatible protocol types that rely on the standard
simulation workflow of Energy Minimization \(\to\)
Equilibration(s) \(\to\) Production Run. One is for the
simulation of a single thermodynamical state and is described in
GROMACS-compatible General Protocol. The
other is for the simulation of an alchemical transformation involving
multiple thermodynamical states and is described in
GROMACS-compatible Alchemical Protocol
For flexibility, one can define custom protocol types by means of the customization API, allowing for e.g. the use of different software packages.
Protocol Output
Every protocol must save the relevant output of the simulations (typically, the path of files created by the production run, e.g. the trajectory) so that it is available for the computation of properties. This is done in the so-called protocol-output object. One protocol-output object is created and stored in memory for each protocol and simulated grid point.
GROMACS-compatible General Protocol
The GROMACS-compatible general protocol is designed to run a sequence
of simulations for a given system, grid point and initial
configuration, the last of which is the production run. It is
identified with the type gmx in the input
file. It is based on the input
parameters:
- system
The name of the system associated with the protocol.
- coords
The name of the coordinates associated with the protocol.
- mdps
The list of file paths of the input-parameter files (extension
.mdp) for each step of the simulation workflow.- maxsteps
The maximum number of steps allowed for the length of the production run.
- minfactor
(optional, default is 1.1) When multiplied by the lastly considered value of the length of the production run, this gives the minimum value allowed for the updated length if the production run is extended. This avoids that a very large number of extensions, each with very small length increments, are carried out by the program.
The method responsible for deploying each simulation workflow receives as parameters the topology file for the system, the initial configuration file associated with the coordinates, the attributes listed above and the length of the production run (in number of steps).
If the requested length is the same as the initial length, as inferred
from the .mdp file of the production run, a standard simulation
routine is deployed, starting at each step from the last state
(.cpt file) available in disk, the last frame of the previous step
(.gro file) or the initial configuration, in this order of
priority.
Tip
In particular, this means that for any two successive
gmak runs, the second one will not rerun simulations that
finished during the first one if the corresponding simulation files
have not been deleted. Also, if the first run has terminated
abruptly, the second run will restart incomplete simulations from
the last state saved to disk instead of from scratch.
If the requested length is larger than the initial length, a simulation extension is deployed. In this case, all steps but the production run are skipped, and the latter has its length extended to the requested length.
For this protocol type, the protocol-output object that is generated
after the simulations have completed is a dict with the
absolute paths of some of the files generated by, or involved in, the
production-run step. The keys are the file extensions, and the
following are included:
xtctprtrredrgrotop
Simulation Extensions
The simulation of a GROMACS-compatible general protocol for a given grid point is extended if the expected value estimated for the grid point of any composite property associated with the protocol is larger than the property tolerance (defined in the input file). In this case, the extended length (in number of steps) is calculated as follows.
First, an estimate for the length \(\ell'\) that equals the statistical error of the property with the tolerance \(t\) is calculated with the expression
where \(\sigma\) is the lastly obtained statistical error, based on a simulation with length \(\ell\). This equation assumes that the statistical error is inversely proportional to the square of the simulation length, an assumption that is valid for large lengths and single-component composite properties whose samples are normally-distributed in the simulation. Although not technically valid in general, the expression above still provides a reasonable heuristic for most applications.
The estimate \(\ell'\) is then compared to the lower and upper
bounds for the extended length of the simulation. These bounds are
given by \(\mathrm{min}\left\{\mathrm{int}(m\ell), L\right\}\) and
\(L\), respectively, where \(m\) and \(L\) are the
minfactor and maxsteps attributes, respectively. If the
estimate \(\ell'\) does not violate any of these bounds, it is
used as the extended length; otherwise, the bound closest to
\(\ell'\) is used.
If multiple composite properties associated with the protocol do not satisfy the statistical-error tolerance for the grid point, the extended lengths are calculated individually for each property, as described above, and the greatest among them is chosen as the extended length.
GROMACS-compatible Alchemical Protocol
The GROMACS-compatible alchemical protocol wraps a sequence of
GROMACS-compatible general protocols (referred to as subprotocols),
with one for each thermodynamical state in the alchemical
transformation. It is identified with the type gmx_alchemical in
the input file. It relies on the same
simulation strategies as the GROMACS-compatible general protocol. The
supported input parameters are also mostly the same, since they are
distributed to the subprotocols.
One notable difference is that, in the alchemical case, the supplied
input-parameter files (the .mdp files specified in the mdps
attribute) are not used in the actual simulations but instead as a
template—new files are generated for each subprotocol by replacing
the state-specific value of the coupling parameter \(\lambda\) for
the original one in the template. The number of \(\lambda\) points
is inferred from the template of the production run.
A similar remark applies to the initial configuration. The configuration implied by the coordinates associated with the protocol is used only for the state \(\lambda=0\). The initial configuration of any subsequent state is the last frame of the production run of the previous state.
For this protocol type, the protocol-output object that is generated
after the simulations have completed is a dict with the
absolute paths of some the files generated by, or involved in, the
production-run steps for each alchemical state. The keys are the file
extensions, and each one maps to a list of the
corresponding files for each state. The following keys are included:
xtctprtrredrgrotopdhdl
Exceptionally, the dhdl key maps to a list of the
dhdl.xvg files for each state (not to files with extension
dhdl). Those are the files that contain the derivatives of the
Hamiltonian with respect to the coupling parameter.
Simulation Extensions
For the GROMACS-compatible alchemical protocol, all subprotocols are identically extended (or not extended) according to the criteria used for GROMACS-compatible general protocols.