Post-processing

class gmak.post_processing.GmakOutput

Class that collects the output of gmak jobs.

Variables
  • data (pandas.DataFrame) – The output data of a gmak job. This contains the estimates (second column-level mu), uncertainties (second column-level sigma) and difference with respect to the reference value (second column-level diff), for all explored grid points, of the composite properties that make up the score function, as well as the score itself. The data frame has two index levels: the grid-shift iteration and the linear index of the grid point.

  • X (pandas.DataFrame) – The main-variation parameters. The index is the same as that of data.

compute_pareto(properties=None, groupby_X=None, **kwargs)

Returns the parameters that are Pareto optimal. It first groups the data using the groupby_X method, unless the parameter groupby_X is passed, in which case it is used instead. The optimization problem considered is minimizing the absolute value of the difference between estimated properties and their reference values.

Parameters
  • properties (list of str) – Properties considered in the optimization problem. By default, all properties are used.

  • groupby_X (pandas.DataFrame) – data frame obtained from a previous call of the groupby_X method. If not given, the groupby_X method will be called with the keyword arguments kwargs.

  • kwargs – Additional keyword arguments passed to the aggregation method (groupby_X) called before computing the Pareto front.

Returns

The list of force-field parameters (each one a tuple of float) that are Pareto optimal.

Return type

list

classmethod from_gmak_bin(binpath)

Creates a GmakOutput instance from a binary state file.

Parameters

binpath (str) – The path of the binary file

Returns

A GmakOutput instance containing the data in the binary file.

Return type

GmakOutput

get_dataframe()

Returns a data frame containing both the main parameters and the properties and scores.

Returns

A data frame indexed by grid-shift iteration and gridpoint (linear index). The columns are the main parameters and the scores and property estimates.

Return type

pandas.DataFrame

groupby_X(agg=True, agg_arg=None, mu_agg=None, sigma_agg=None)

Collects data that corresponds to the same force-field parameters and optionally aggregate these values using functions.

Parameters
  • agg (bool) – Indicates whether aggregation is desired. This has priority over agg_arg, mu_agg and sigma_agg (default is True).

  • agg_arg (function, str, list, dict or None) – The func argument of the aggregate() function used to aggregate values of a common group. This has priority over the parameters mu_agg and sigma_agg.

  • mu_agg (function, str or None) – The function used to aggregate expected values and their differences with respect to reference data. By default, uses mean.

  • sigma_agg (function, str or None) – The function used to aggregate uncertainties. By default, uses \(\sqrt{k_1^2 + \cdots + k_n^2}/n\), where the \(k_i\) are the values to be aggregated.

Returns

A data frame with the force-field parameters as index and the score and properties as columns. If aggregation is not requested, each entry in a column is either a float or a list of floats, depending on whether it was estimated for the force-field parameter once or more than once. Otherwise, each entry results from aggregating the values for the same force-field parameter based on the parameters agg_arg, mu_agg and sigma_agg.

Return type

pandas.DataFrame