Stats API
Statistics required for RCMIP analysis
- pyrcmip.stats.get_skewed_normal(median, lower, upper, conf, input_data)
Get skewed normal distribution matching the inputs
- Parameters
median (float) – Median of the output distribution
lower (float) – Lower bound of the confidence interval
upper (float) – Upper bound of the confidence interval
conf (float) – Confidence associated with the interval [lower, upper] e.g. 0.66 would mean that [lower, upper] defines the 66% confidence range
input_data (
np.ndarray
) – Points from the derived distribution to return. For each point, Y, ininput_data
, we determine the value at which a cumulative probability of Y is achieved. As a result, all values ininput_data
must be in the range [0, 1]. Hence if you want a random sample from the derived skewed normal, simply makeinput_data
equal to a random sample of the uniform distribution [0, 1]
- Returns
Points sampled from the derived skewed normal distribution based on
input_data
- Return type
np.ndarray
- pyrcmip.stats.sample_multivariate_skewed_normal(configuration, size, cor=None)
Sample multi-variate skewed normal distribution
Following [Meinshausen et al. (2009)](https://doi.org/10.1038/nature08017), a skewed normal is defined as follows: “A distribution X, is skewed normal, if \(\\log(X + C)\), where \(C\) is a constant, is a normal distribution with variance \(\\sigma^2\) and mean \(\\mu\)”. The skewed normal allows us to create distributions which match arbitrary median and likely ranges (with associated confidence as a percentage), as is often used by the IPCC. A multivariate skewed normal is constructed such that each marginal distribution is a skewed normal and the overall distribution can have a non-identity correlation matrix.
- Parameters
configuration (
pd.DataFrame
) – Configuration for the sampling. Each column must represent a dimension to be sampled. The rows must be["median", "upper", "lower", "conf"]
. The rows represent the characteristics of each marginal distribution. The median is the median of each marginal distribution."conf"
represents the confidence associated with each of the intervals defined by"lower"
and"upper"
(e.g. 0.66 would mean that [lower, upper] defines the 66% confidence range).size (int) – Number of points to sample from the multivariate skewed normal
cor (array[float]) – Correlation matrix between different dimensions in configuration. cor must be a square, symmetric matrix with size n x n, where n is the number of columns in
configuration
. The element in row i and column j represents the correlation between the dimension in column i ofconfiguration
and the dimension in column j ofconfiguration
. The correlations must be normalised i.e. the maximum value of each element is 1 and the minimum value is -1. As a result of the normalisation, the diagonals ofcor
must all be equal to 1.
- Returns
Points sampled from the derived multivariate skewed normal distribution. Each row is a sampled point (so there will be
size
row in the output). The columns match the columns inconfiguration
.- Return type
pd.DataFrame
- Raises
ValueError –
configuration
contains any nans, distribution ranges are incorrectly specified (e.g. medians > upper ends of the intervals) or the correlation matrix is incorrectly normalised.KeyError –
configuration
is missing one of the rows:["median", "upper", "lower", "conf"]