Post-processing
- class gmak.post_processing.GmakOutput
Class that collects the output of
gmakjobs.- Variables
data (pandas.DataFrame) – The output data of a
gmakjob. This contains the estimates (second column-levelmu), uncertainties (second column-levelsigma) and difference with respect to the reference value (second column-leveldiff), for all explored grid points, of the composite properties that make up the score function, as well as the score itself. The data frame has two index levels: the grid-shift iteration and the linear index of the grid point.X (pandas.DataFrame) – The main-variation parameters. The index is the same as that of
data.
- compute_pareto(properties=None, groupby_X=None, **kwargs)
Returns the parameters that are Pareto optimal. It first groups the data using the
groupby_Xmethod, unless the parameter groupby_X is passed, in which case it is used instead. The optimization problem considered is minimizing the absolute value of the difference between estimated properties and their reference values.- Parameters
properties (list of str) – Properties considered in the optimization problem. By default, all properties are used.
groupby_X (pandas.DataFrame) – data frame obtained from a previous call of the
groupby_Xmethod. If not given, thegroupby_Xmethod will be called with the keyword argumentskwargs.kwargs – Additional keyword arguments passed to the aggregation method (
groupby_X) called before computing the Pareto front.
- Returns
The list of force-field parameters (each one a tuple of float) that are Pareto optimal.
- Return type
- classmethod from_gmak_bin(binpath)
Creates a
GmakOutputinstance from a binary state file.- Parameters
binpath (str) – The path of the binary file
- Returns
A
GmakOutputinstance containing the data in the binary file.- Return type
- get_dataframe()
Returns a data frame containing both the main parameters and the properties and scores.
- Returns
A data frame indexed by grid-shift iteration and gridpoint (linear index). The columns are the main parameters and the scores and property estimates.
- Return type
- groupby_X(agg=True, agg_arg=None, mu_agg=None, sigma_agg=None)
Collects data that corresponds to the same force-field parameters and optionally aggregate these values using functions.
- Parameters
agg (bool) – Indicates whether aggregation is desired. This has priority over
agg_arg,mu_aggandsigma_agg(default isTrue).agg_arg (function, str, list, dict or None) – The func argument of the
aggregate()function used to aggregate values of a common group. This has priority over the parametersmu_aggandsigma_agg.mu_agg (function, str or None) – The function used to aggregate expected values and their differences with respect to reference data. By default, uses mean.
sigma_agg (function, str or None) – The function used to aggregate uncertainties. By default, uses \(\sqrt{k_1^2 + \cdots + k_n^2}/n\), where the \(k_i\) are the values to be aggregated.
- Returns
A data frame with the force-field parameters as index and the score and properties as columns. If aggregation is not requested, each entry in a column is either a float or a list of floats, depending on whether it was estimated for the force-field parameter once or more than once. Otherwise, each entry results from aggregating the values for the same force-field parameter based on the parameters agg_arg, mu_agg and sigma_agg.
- Return type