SuanShu, a Java numerical and statistical library

com.numericalmethod.suanshu.stats.evt.evd.univariate.fitting.acer

Class ACERAnalysis

• java.lang.Object
• com.numericalmethod.suanshu.stats.evt.evd.univariate.fitting.acer.ACERAnalysis

• public class ACERAnalysis
extends Object
Average Conditional Exceedance Rate (ACER) method is for estimating the cdf of the maxima $$M$$ distribution from observations. $F_M(\eta) = Pr(\max(X_1,X_2,...,X_n) \le \eta)$ With the assumption of the k-step markov-like dependency, the cdf $$F_M(\eta)$$ becomes: $\begin{eqnarray} F_M(\eta) \approx P_k(\eta) & = & \exp(- \sum_{j=k}^N \alpha_{kj}(\eta) - \alpha_{k-1,k-1}(\eta) - ... - \alpha_{11}(\eta)) \\ & \approx & \exp(- \sum_{j=k}^N \alpha_{kj}(\eta)) \\ & \approx & \exp(- \epsilon_k(\eta) N) \end{eqnarray}$ where $$\alpha_{kj}(\eta)$$ is the probability of the j-th element exceeding $$\eta$$ conditional on the previous (k-1) non-exceedances, and $$\epsilon_k(\eta)$$ is the mean of these conditional probabilities when $$N \to \infty$$. The result could be used for estimation of an extreme distribution.

This algorithm works as follows:

1. By counting the occurrences of threshold exceedances conditional on (k-1) previous non-exceedances in the observations, find the empirical values for $$P_k(\eta_i)$$ where $$\eta_i$$ are various equally-spaced barrier levels (or, thresholds) greater than the given tail marker.
2. Using the empirical $$P_k(\eta_i)$$, fit the sub-asymptotic form of Gumbel distribution, i.e., the ACER function: $\hat{\epsilon_k}(\eta) = q_k exp(-a_k (\eta - b_k)^{c_k}), \eta \ge \eta_1.$
3. With the fitted parameters of the ACER function, calculate the confidence interval of the fitted ACER function.

Another well-known estimation method is Peaks Over Threshold (POT). POT method assumes independence among extreme events, and therefore always requires declustering and dropping other non-peak data. This is considered to be wasteful. On the other hand, ACER method accounts for Markov-like dependence (i.e., k-step memory) in time series (with k=1 as a special case for event independence). That is, a threshold exceedance is considered as an occurrence if the previous (k-1) points are below the threshold. Experiments show that k=2 (i.e., conditional on one previous non-exceedance) is accurate enough for estimation for a wide range of data.

The R equivalent function is acer::acer.analysis.

"Naess, A. and Gaidai, O., "Estimation of extreme values from sampled time series," in Structural Safety 31 (2009), p. 325-334"
• Nested Class Summary

Nested Classes
Modifier and Type Class and Description
static class  ACERAnalysis.Result
• Constructor Summary

Constructors
Constructor and Description
ACERAnalysis()
Create an instance with the default values.
ACERAnalysis(int kStepMemory, int nLevels, double confidenceLevel, boolean usePeaksOnly, boolean weightedByPeakCount)
Create an instance with various options listed below.
• Method Summary

All Methods
Modifier and Type Method and Description
ACERAnalysis.Result run(double[][] observations, double tailMarker)
Run the analysis with multi-period observations.
ACERAnalysis.Result run(double[] observations, double tailMarker)
Run the analysis with single-period observations.
• Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
• Constructor Detail

• ACERAnalysis

public ACERAnalysis()
Create an instance with the default values. That is,
 
this(2, 300, 0.95, true, true);


• ACERAnalysis

public ACERAnalysis(int kStepMemory,
int nLevels,
double confidenceLevel,
boolean usePeaksOnly,
boolean weightedByPeakCount)
Create an instance with various options listed below.
• the value of k in the assumed k-step memory model (k = 1 means no dependency on previous observations; k = 2 is good enough for most cases)
• the number of barrier levels $$\eta_i$$ for estimating the ACER values at different levels
• the confidence level for computing the confidence interval of the estimated ACER function
• whether or not to use only peaks in the observations for estimation (peaks are defined as data points whose values are preceded and followed by values smaller than itself)
• whether or not to put more emphasis on periods in which more events occur
Parameters:
kStepMemory - the value of k in the k-step memory model
nLevels - the number of barrier levels to be used for estimation
confidenceLevel - the confidence level for computing confidence interval
usePeaksOnly - true if use only peaks in the observations for estimation
weightedByPeakCount - true if weight periods by the peak counts in the periods
• Method Detail

• run

public ACERAnalysis.Result run(double[] observations,
double tailMarker)
Run the analysis with single-period observations. Tail marker is used to determine the start of the distribution tail, i.e., extreme values.
Parameters:
observations - the observations (one row for each period, can has different length)
tailMarker - the appropriately chosen tail level $$\eta_1$$
Returns:
the analysis result
• run

public ACERAnalysis.Result run(double[][] observations,
double tailMarker)
Run the analysis with multi-period observations. Tail marker is used to determine the start of the distribution tail, i.e., extreme values.
Parameters:
observations - the observations (one row for each period, can has different length)
tailMarker - the appropriately chosen tail level $$\eta_1$$
Returns:
the analysis result