Thursday, October 11, 2012

TEST FOR GOODNESS OF FIT

A new casino game involves rolling 3 dice. The winnings are directly proportional to the total number of sixes rolled. Suppose a gambler plays the game 100 times, with the following observed counts:
Number of Sixes         Number of Rolls  
        0                       48
        1                       35
        2                       15
        3                        3
The casino becomes suspicious of the gambler and wishes to determine whether the dice are fair. What do they conclude?If a die is fair, we would expect the probability of rolling a 6 on any given toss to be 1/6. Assuming the 3 dice are independent (the roll of one die should not affect the roll of the others), we might assume that the number of sixes in three rolls is distributed Binomial(3,1/6). To determine whether the gambler's dice are fair, we may compare his results with the results expected under this distribution. The expected values for 0, 1, 2, and 3 sixes under the Binomial(3,1/6) distribution are the following:
Null Hypothesis:
p1 = P(roll 0 sixes) = P(X=0) = 0.58
p2 = P(roll 1 six) = P(X=1) = 0.345
p3 = P(roll 2 sixes) = P(X=2) = 0.07
p4 = P(roll 3 sixes) = P(X=3) = 0.005.
Since the gambler plays 100 times, the expected counts are the following:
Number of Sixes         Expected Counts         Observed Counts  
        0                       58                      48
        1                       34.5                    35
        2                        7                      15
        3                        0.5                     3
The two plots shown below provide a visual comparison of the expected and observed values:
From these graphs, it is difficult to distinguish differences between the observed and expected counts. A visual representation of the differences is the chi-gram, which plots the observed - expected counts divided by the square root of the expected counts, as shown below:

The chi-square statistic is the sum of the squares of the plotted values,
(48-58)²/58 + (35-34.5)²/58 + (15-7)²/7 + (3-0.5)²/0.5
= 1.72 + 0.007 + 9.14 + 12.5 = 23.367.

A random variable Xis said to have a chi-square distribution with m degrees of freedom if it is the sum of the squares of m independent standard normal random variables (the square of a single standard normal random variable has a chi-square distribution with one degree of freedom). This distribution is denoted X2(m), with associated probability values available in Table G in Moore and McCabe and in MINITAB.
The standardized counts (observed - expected )/sqrt(expected) for k possibilities are approximately normal, but they are not independent because one of the counts is entirely determined by the sum of the others (since the total of the observed and expected counts must sum to n). This results in a loss of one degree of freedom, so it turns out the the distribution of the chi-square test statistic based on k counts is approximately the chi-square distribution with m = k-1 degrees of freedom, denoted X2(k-1).

Hypothesis Testing

We use the chi-square test to test the validity of a distribution assumed for a random phenomenon. The test evaluates the null hypotheses H0 (that the data are governed by the assumed distribution) against the alternative (that the data are not drawn from the assumed distribution).Let p1, p2, ..., pk denote the probabilities hypothesized for k possible outcomes. In n independent trials, we let Y1, Y2, ..., Yk denote the observed counts of each outcome which are to be compared to the expected counts np1, np2, ..., npk. The

chi-square test statistic is qk-1 = 

= (Y1 - np1)²  + (Y2 - np2)² + ... + (Yk - npk)²
  ----------    ----------          --------
     np1           np2                npk
Reject H0 if this value exceeds the upper  α critical value of the X(k-1) distribution, where α
is the desired level of significance.
by: J-Lynn B. Ramos

No comments:

Post a Comment