Stats API
Statistics required for RCMIP analysis
- pyrcmip.stats.get_skewed_normal(median, lower, upper, conf, input_data)
Get skewed normal distribution matching the inputs
- Parameters:
median (float) – Median of the output distribution
lower (float) – Lower bound of the confidence interval
upper (float) – Upper bound of the confidence interval
conf (float) – Confidence associated with the interval [lower, upper] e.g. 0.66 would mean that [lower, upper] defines the 66% confidence range
input_data (
np.ndarray) – Points from the derived distribution to return. For each point, Y, ininput_data, we determine the value at which a cumulative probability of Y is achieved. As a result, all values ininput_datamust be in the range [0, 1]. Hence if you want a random sample from the derived skewed normal, simply makeinput_dataequal to a random sample of the uniform distribution [0, 1]
- Returns:
Points sampled from the derived skewed normal distribution based on
input_data- Return type:
np.ndarray
- pyrcmip.stats.sample_multivariate_skewed_normal(configuration, size, cor=None)
Sample multi-variate skewed normal distribution
Following [Meinshausen et al. (2009)](https://doi.org/10.1038/nature08017), a skewed normal is defined as follows: “A distribution X, is skewed normal, if \(\\log(X + C)\), where \(C\) is a constant, is a normal distribution with variance \(\\sigma^2\) and mean \(\\mu\)”. The skewed normal allows us to create distributions which match arbitrary median and likely ranges (with associated confidence as a percentage), as is often used by the IPCC. A multivariate skewed normal is constructed such that each marginal distribution is a skewed normal and the overall distribution can have a non-identity correlation matrix.
- Parameters:
configuration (
pd.DataFrame) – Configuration for the sampling. Each column must represent a dimension to be sampled. The rows must be["median", "upper", "lower", "conf"]. The rows represent the characteristics of each marginal distribution. The median is the median of each marginal distribution."conf"represents the confidence associated with each of the intervals defined by"lower"and"upper"(e.g. 0.66 would mean that [lower, upper] defines the 66% confidence range).size (int) – Number of points to sample from the multivariate skewed normal
cor (array[float]) – Correlation matrix between different dimensions in configuration. cor must be a square, symmetric matrix with size n x n, where n is the number of columns in
configuration. The element in row i and column j represents the correlation between the dimension in column i ofconfigurationand the dimension in column j ofconfiguration. The correlations must be normalised i.e. the maximum value of each element is 1 and the minimum value is -1. As a result of the normalisation, the diagonals ofcormust all be equal to 1.
- Returns:
Points sampled from the derived multivariate skewed normal distribution. Each row is a sampled point (so there will be
sizerow in the output). The columns match the columns inconfiguration.- Return type:
pd.DataFrame- Raises:
ValueError –
configurationcontains any nans, distribution ranges are incorrectly specified (e.g. medians > upper ends of the intervals) or the correlation matrix is incorrectly normalised.KeyError –
configurationis missing one of the rows:["median", "upper", "lower", "conf"]