Post-processing

class gmak.post_processing.GmakOutput

Class that collects the output of gmak jobs.

Variables

data (pandas.DataFrame) – The output data of a gmak job. This contains the estimates (second column-level mu), uncertainties (second column-level sigma) and difference with respect to the reference value (second column-level diff), for all explored grid points, of the composite properties that make up the score function, as well as the score itself. The data frame has two index levels: the grid-shift iteration and the linear index of the grid point.
X (pandas.DataFrame) – The main-variation parameters. The index is the same as that of data.

compute_pareto(properties=None, groupby_X=None, **kwargs)

Returns the parameters that are Pareto optimal. It first groups the data using the groupby_X method, unless the parameter groupby_X is passed, in which case it is used instead. The optimization problem considered is minimizing the absolute value of the difference between estimated properties and their reference values.

Parameters

properties (list of str) – Properties considered in the optimization problem. By default, all properties are used.
groupby_X (pandas.DataFrame) – data frame obtained from a previous call of the groupby_X method. If not given, the groupby_X method will be called with the keyword arguments kwargs.
kwargs – Additional keyword arguments passed to the aggregation method (groupby_X) called before computing the Pareto front.

Returns

The list of force-field parameters (each one a tuple of float) that are Pareto optimal.

Return type

list

classmethod from_gmak_bin(binpath)

Creates a GmakOutput instance from a binary state file.

Parameters: binpath (str) – The path of the binary file
Returns: A GmakOutput instance containing the data in the binary file.
Return type: GmakOutput

get_dataframe()

Returns a data frame containing both the main parameters and the properties and scores.

Returns: A data frame indexed by grid-shift iteration and gridpoint (linear index). The columns are the main parameters and the scores and property estimates.
Return type: pandas.DataFrame

groupby_X(agg=True, agg_arg=None, mu_agg=None, sigma_agg=None)

Collects data that corresponds to the same force-field parameters and optionally aggregate these values using functions.

Parameters

agg (bool) – Indicates whether aggregation is desired. This has priority over agg_arg, mu_agg and sigma_agg (default is True).
agg_arg (function, str, list, dict or None) – The func argument of the aggregate() function used to aggregate values of a common group. This has priority over the parameters mu_agg and sigma_agg.
mu_agg (function, str or None) – The function used to aggregate expected values and their differences with respect to reference data. By default, uses mean.
sigma_agg (function, str or None) – The function used to aggregate uncertainties. By default, uses \(\sqrt{k_1^2 + \cdots + k_n^2}/n\), where the \(k_i\) are the values to be aggregated.

Returns

A data frame with the force-field parameters as index and the score and properties as columns. If aggregation is not requested, each entry in a column is either a float or a list of floats, depending on whether it was estimated for the force-field parameter once or more than once. Otherwise, each entry results from aggregating the values for the same force-field parameter based on the parameters agg_arg, mu_agg and sigma_agg.

Return type

pandas.DataFrame