SuanShu, a Java numerical and statistical library

com.numericalmethod.suanshu.stats.test.distribution

Class AndersonDarlingPValue

• java.lang.Object
• com.numericalmethod.suanshu.stats.test.distribution.AndersonDarlingPValue

• public class AndersonDarlingPValue
extends Object
This algorithm calculates the p-value when the Anderson-Darling statistic and the number of samples are given. The p-value is calculated by the interpolation formula (section 4, p.920): $t_m\left ( \alpha \right ) = b_0 + \frac{b_1}{\sqrt m} + \frac{b_2}{m}$ where the coefficients for each α are calculated by OLS regression using data in Table 1. m is the total number of samples minus 1.

We use a two-step procedure to interpolate the data in Table 1. In the first step, the dependent variables are 1/\sqrt(m) and 1/m, where m = 1, ... 10, 1000000. The independent variable is statistics corresponding to upper percentiles 0.25, 0.1, 0.05, 0.025, 0.01. The prediction values corresponding to actual number of samples minus 1 are stored. Therefore there are 5 OLS regressions in this step and 5 prediction values.

In the second step, the dependent variables are 5 predictions and their squares, and the independent variables are the p-values {0.25,0.1,0.05,0.025,0.01}. The p-value corresponding to the actual statistics tm is predicted by the linear regression model tm(\alpha) = b0+b1/\sqrt(m)+b2/m.

The details of this step is not mentioned in the paper. The process of calculating p-value when the statistics is not in the table is documented by only one sentence in right column paragraph 3, p. 920: "Similarly, one could interpolate and even extrapolate p-value for the observed Anderson-Darling statistic; see Section 7 for an example." The author suggests using linear extrapolation. We use the second order extrapolation for two reasons: 1) By regressing the p-values against the statistics in Table 1. We found that the coefficient of the second order term is significant in most cases and the R square value is higher than the regression which only include the first order term. This indicates by including the second order term, the extrapolation is more accurate. Take m=1 as an example: the p-value of the second order coefficient is 0.03352. The corresponding R square 0.9994. On the other hand the R square of regression which only includes the first order term is 0.9939. 2) The R program includes also the second order term.

"Scholz, F.W., and Stephens, M.A., "K-sample Anderson-Darling Tests", Journal of the American Statistical Association, Vol. 82, No. 399, 1987."
• Constructor Summary

Constructors
Constructor and Description
AndersonDarlingPValue(int m)
Construct the Anderson-Darling distribution for a particular number of samples.
• Method Summary

All Methods
Modifier and Type Method and Description
double alpha(double tm)
Gets the p-value for a test statistic.
• Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
• Constructor Detail

• AndersonDarlingPValue

public AndersonDarlingPValue(int m)
Construct the Anderson-Darling distribution for a particular number of samples.
Parameters:
m - the number of samples minus 1
• Method Detail

• alpha

public double alpha(double tm)
Gets the p-value for a test statistic.
Parameters:
tm - the test statistics
Returns:
the p-value corresponding to the test statistics