Stats API

Statistics required for RCMIP analysis

pyrcmip.stats.get_skewed_normal(median, lower, upper, conf, input_data)

Get skewed normal distribution matching the inputs

Parameters:

median (float) – Median of the output distribution
lower (float) – Lower bound of the confidence interval
upper (float) – Upper bound of the confidence interval
conf (float) – Confidence associated with the interval [lower, upper] e.g. 0.66 would mean that [lower, upper] defines the 66% confidence range
input_data (np.ndarray) – Points from the derived distribution to return. For each point, Y, in input_data, we determine the value at which a cumulative probability of Y is achieved. As a result, all values in input_data must be in the range [0, 1]. Hence if you want a random sample from the derived skewed normal, simply make input_data equal to a random sample of the uniform distribution [0, 1]

Returns:

Points sampled from the derived skewed normal distribution based on input_data

Return type:

np.ndarray

pyrcmip.stats.sample_multivariate_skewed_normal(configuration, size, cor=None)

Sample multi-variate skewed normal distribution

Following [Meinshausen et al. (2009)](https://doi.org/10.1038/nature08017), a skewed normal is defined as follows: “A distribution X, is skewed normal, if \(\\log(X + C)\), where \(C\) is a constant, is a normal distribution with variance \(\\sigma^2\) and mean \(\\mu\)”. The skewed normal allows us to create distributions which match arbitrary median and likely ranges (with associated confidence as a percentage), as is often used by the IPCC. A multivariate skewed normal is constructed such that each marginal distribution is a skewed normal and the overall distribution can have a non-identity correlation matrix.

Parameters:

configuration (pd.DataFrame) – Configuration for the sampling. Each column must represent a dimension to be sampled. The rows must be ["median", "upper", "lower", "conf"]. The rows represent the characteristics of each marginal distribution. The median is the median of each marginal distribution. "conf" represents the confidence associated with each of the intervals defined by "lower" and "upper" (e.g. 0.66 would mean that [lower, upper] defines the 66% confidence range).
size (int) – Number of points to sample from the multivariate skewed normal
cor (array[float]) – Correlation matrix between different dimensions in configuration. cor must be a square, symmetric matrix with size n x n, where n is the number of columns in configuration. The element in row i and column j represents the correlation between the dimension in column i of configuration and the dimension in column j of configuration. The correlations must be normalised i.e. the maximum value of each element is 1 and the minimum value is -1. As a result of the normalisation, the diagonals of cor must all be equal to 1.

Returns:

Points sampled from the derived multivariate skewed normal distribution. Each row is a sampled point (so there will be size row in the output). The columns match the columns in configuration.

Return type:

pd.DataFrame

Raises:

ValueError – configuration contains any nans, distribution ranges are incorrectly specified (e.g. medians > upper ends of the intervals) or the correlation matrix is incorrectly normalised.
KeyError – configuration is missing one of the rows: ["median", "upper", "lower", "conf"]