Protocols

The protocol is the object responsible for carrying out the simulations and simulation extensions (if they are desired) for a given system, grid point and initial configuration. gmak provides two GROMACS-compatible protocol types that rely on the standard simulation workflow of Energy Minimization \(\to\) Equilibration(s) \(\to\) Production Run. One is for the simulation of a single thermodynamical state and is described in GROMACS-compatible General Protocol. The other is for the simulation of an alchemical transformation involving multiple thermodynamical states and is described in GROMACS-compatible Alchemical Protocol

For flexibility, one can define custom protocol types by means of the customization API, allowing for e.g. the use of different software packages.

Protocol Output

Every protocol must save the relevant output of the simulations (typically, the path of files created by the production run, e.g. the trajectory) so that it is available for the computation of properties. This is done in the so-called protocol-output object. One protocol-output object is created and stored in memory for each protocol and simulated grid point.

GROMACS-compatible General Protocol

The GROMACS-compatible general protocol is designed to run a sequence of simulations for a given system, grid point and initial configuration, the last of which is the production run. It is identified with the type gmx in the input file. It is based on the input parameters:

system

The name of the system associated with the protocol.

coords

The name of the coordinates associated with the protocol.

mdps

The list of file paths of the input-parameter files (extension .mdp) for each step of the simulation workflow.

maxsteps

The maximum number of steps allowed for the length of the production run.

minfactor

(optional, default is 1.1) When multiplied by the lastly considered value of the length of the production run, this gives the minimum value allowed for the updated length if the production run is extended. This avoids that a very large number of extensions, each with very small length increments, are carried out by the program.

The method responsible for deploying each simulation workflow receives as parameters the topology file for the system, the initial configuration file associated with the coordinates, the attributes listed above and the length of the production run (in number of steps).

If the requested length is the same as the initial length, as inferred from the .mdp file of the production run, a standard simulation routine is deployed, starting at each step from the last state (.cpt file) available in disk, the last frame of the previous step (.gro file) or the initial configuration, in this order of priority.

Tip

In particular, this means that for any two successive gmak runs, the second one will not rerun simulations that finished during the first one if the corresponding simulation files have not been deleted. Also, if the first run has terminated abruptly, the second run will restart incomplete simulations from the last state saved to disk instead of from scratch.

If the requested length is larger than the initial length, a simulation extension is deployed. In this case, all steps but the production run are skipped, and the latter has its length extended to the requested length.

For this protocol type, the protocol-output object that is generated after the simulations have completed is a dict with the absolute paths of some of the files generated by, or involved in, the production-run step. The keys are the file extensions, and the following are included:

  • xtc

  • tpr

  • trr

  • edr

  • gro

  • top

Simulation Extensions

The simulation of a GROMACS-compatible general protocol for a given grid point is extended if the expected value estimated for the grid point of any composite property associated with the protocol is larger than the property tolerance (defined in the input file). In this case, the extended length (in number of steps) is calculated as follows.

First, an estimate for the length \(\ell'\) that equals the statistical error of the property with the tolerance \(t\) is calculated with the expression

\[\ell' = \mathrm{int}\left(\frac{\ell\sigma^2}{t^2}\right)\ ,\]

where \(\sigma\) is the lastly obtained statistical error, based on a simulation with length \(\ell\). This equation assumes that the statistical error is inversely proportional to the square of the simulation length, an assumption that is valid for large lengths and single-component composite properties whose samples are normally-distributed in the simulation. Although not technically valid in general, the expression above still provides a reasonable heuristic for most applications.

The estimate \(\ell'\) is then compared to the lower and upper bounds for the extended length of the simulation. These bounds are given by \(\mathrm{min}\left\{\mathrm{int}(m\ell), L\right\}\) and \(L\), respectively, where \(m\) and \(L\) are the minfactor and maxsteps attributes, respectively. If the estimate \(\ell'\) does not violate any of these bounds, it is used as the extended length; otherwise, the bound closest to \(\ell'\) is used.

If multiple composite properties associated with the protocol do not satisfy the statistical-error tolerance for the grid point, the extended lengths are calculated individually for each property, as described above, and the greatest among them is chosen as the extended length.

GROMACS-compatible Alchemical Protocol

The GROMACS-compatible alchemical protocol wraps a sequence of GROMACS-compatible general protocols (referred to as subprotocols), with one for each thermodynamical state in the alchemical transformation. It is identified with the type gmx_alchemical in the input file. It relies on the same simulation strategies as the GROMACS-compatible general protocol. The supported input parameters are also mostly the same, since they are distributed to the subprotocols.

One notable difference is that, in the alchemical case, the supplied input-parameter files (the .mdp files specified in the mdps attribute) are not used in the actual simulations but instead as a template—new files are generated for each subprotocol by replacing the state-specific value of the coupling parameter \(\lambda\) for the original one in the template. The number of \(\lambda\) points is inferred from the template of the production run.

A similar remark applies to the initial configuration. The configuration implied by the coordinates associated with the protocol is used only for the state \(\lambda=0\). The initial configuration of any subsequent state is the last frame of the production run of the previous state.

For this protocol type, the protocol-output object that is generated after the simulations have completed is a dict with the absolute paths of some the files generated by, or involved in, the production-run steps for each alchemical state. The keys are the file extensions, and each one maps to a list of the corresponding files for each state. The following keys are included:

  • xtc

  • tpr

  • trr

  • edr

  • gro

  • top

  • dhdl

Exceptionally, the dhdl key maps to a list of the dhdl.xvg files for each state (not to files with extension dhdl). Those are the files that contain the derivatives of the Hamiltonian with respect to the coupling parameter.

Simulation Extensions

For the GROMACS-compatible alchemical protocol, all subprotocols are identically extended (or not extended) according to the criteria used for GROMACS-compatible general protocols.