The chi-square test (Snedecor and Cochran,
1989) is used to test if a sample of data came from a population with
a specific distribution.
An
attractive feature of the chi-square goodness-of-fit test is that it can be
applied to any univariate distribution for which you can calculate the cumulative distribution function. The
chi-square goodness-of-fit test is applied to binned data (i.e., data put into
classes). This is actually not a restriction since for non-binned data you can
simply calculate a histogram or frequency table before generating the
chi-square test. However, the value of the chi-square test statistic are
dependent on how the data is binned. Another disadvantage of the chi-square
test is that it requires a sufficient sample size in order for the chi-square
approximation to be valid. The chi-square test is an alternative to the Anderson-Darling and Kolmogorov-Smirnov goodness-of-fit tests. The chi-square goodness-of-fit test can be applied to discrete distributions such as the binomial and the Poisson. The Kolmogorov-Smirnov and Anderson-Darling tests are restricted to continuous distributions.
H0:
|
The
data follow a specified distribution.
|
Ha:
|
The data do not follow
the specified distribution.
|
Test
Statistic:
|
For the chi-square goodness-of-fit
computation, the data are divided into k bins and the test statistic
is defined as
where
where F is the cumulative
Distribution function for the
distribution being tested, Yu
is the upper limit for class i,
Yl is the
lower limit for class i,
and N is the sample size.
This test is sensitive to the choice of bins.
There is no optimal choice for the bin width (since the optimal bin width
depends on the distribution). Most reasonable choices should produce similar,
but not identical, results. For the chi-square approximation to be valid, the
expected frequency should be at least 5. This test is not valid for small
samples, and if some of the counts are less than five, you may need to
combine some bins in the tails. |
Significance
Level:
|
|
Critical
Region:
|
The test statistic follows, approximately,
a chi-square distribution with (k - c) degrees of freedom where
k is the number of non-empty cells and c = the number of
estimated parameters (including location and
scale parameters and shape
parameters) for the distribution + 1.
For example, for a 3-parameter Weibull distribution, c = 4.
Therefore, the hypothesis that the data are
from a population with the specified distribution is rejected if
where
|
Importance:
Many statistical tests and procedures are based
on specific distributional assumptions. The assumption of normality is particularly common in classical
statistical tests. Much reliability modeling is based on the assumption that
the distribution of the data follows a Weibull distribution.
There
are many non-parametric and robust techniques that are not based on strong
distributional assumptions. By non-parametric, we mean a technique, such as the
sign test, that is not based on a specific distributional assumption. By
robust, we mean a statistical technique that performs well under a wide range
of distributional assumptions. However, techniques based on specific
distributional assumptions are in general more powerful than these
non-parametric and robust techniques. By power, we mean the ability to detect a
difference when that difference actually exists. Therefore, if the
distributional assumption can be confirmed, the parametric techniques are
generally preferred.
If you are using a technique that makes a
normality (or some other type of distributional) assumption, it is important to
confirm that this assumption is in fact justified. If it is, the more powerful
parametric techniques can be used. If the distributional assumption is not
justified, a non-parametric or robust technique may be required.
By: Lera Gay Bacay
No comments:
Post a Comment