Here we consider only dichotomous data. Paired dichotomous data can be gotten in many ways. Some examples:
How does this situation differ from the one for which the paired t-test is appropriate? Just as in the discussion leading to the paired t-test, you make two measurements on the same experimental unit, but in this case the only possible outcomes for each measurement are 0 and 1 (representing "no" and "yes", or equivalently representing "failure" and "success"). Except for this, the two situations are the same.
Data: (X1,Y1),...,(Xn,Yn) a simple random sample of paired observations, where
the X's are dichotomous 0 or 1, and the Y's are dichotomous 0 or 1. Let p1 be
the proportion of 1's (successes) among the X's, and let p2 be the proportion of 1's
(successes) among the Y's.
Generally within a pair, Xi and Yi are dependent; we want to test whether
the distributions for X and for Y are the same. This is the same as testing the
null hypothesis that p1 and p2 are equal.
e.g. Cosmetic skin testing for hypoallergenicity. To show a new product is "hypoallergenic" you must prove it provokes fewer reactions than the current market leader. You can test subjects to see of they react to a cosmetic mix put on their skin. Suppose you use each subject in your experiment twice: once with the "Market Leader" and once with the "New Product". You recruit 40 subjects. Paint the backs of these 40 subjects with 2 patches: one patch using the Market Leader, the other using the New Product. Look for reactions: 0=none and 1=any reaction. Suppose that 45% (18/40) react to the Market Leader, 20% (8/40) react to the New Product, 7.5% (3/40) to both, 42.5% (17/40) to neither. Summarize the data:
Market Leader New Product # cases
0 (no) 0 (no) 17
0 (no) 1 (yes) 5
1 (yes) 0 (no) 15
1 (yes) 1 (yes) 3
----
40
Note that there are no other possibilities when the measurement is
dichotomous. The question: Is there a difference between the Market Leader and
the New Product in the probability of provoking a reaction (i.e. is p1 equal to p2)?Can you use a paired t-test here? It's not justified since the differences are not normally distributed - the only possible differences are -1, 0, or +1. The test we do get, however, uses only these differences, so the idea of a paired t-test is not off the wall.
What about ties, which show up as 0 differences? A no-no response and a yes-yes response tell you nothing about whether the two treatments are the same. So discard the ties (i.e. drop all 0 differences), leaving n'=20 untied observations: 15 where the New Product was better, and 5 where the Market Leader was better. Count each of these as a success if the new is "better" than the old, as a failure if not. Then under the null hypothesis of no difference, you are sampling from a dichotomous population with p=0.5 which leads us to a Binomial Distribution.
Let B=#pairs where Yi < Xi among the 20 untied cases=15. Note that Yi < Xi means that for
person i, the New Product was "better" than the Market Leader.
With no ties, under the null hypothesis of no difference, it is just a likely for the New Product
to be better than the Market Leader, as it is for the Market Leader to be better than the
New Product. So if the null hypothesis is true, B ~ Bin(20,.5). Since 20(.5)=10
the sample size is large enough to justify using a normal approximation to test
the null hypothesis. Formally, we write
Ho: Probability of a reaction is the same for the Market Leader and the New Product.
Ha: Not the same
Test Statistic: z=(15-10)/2.236=2.236
Critical Region: Reject Ho in favor of Ha at 5% if |z| >= 1.96
P-value=P(z<=-2.236)+P(z>=2.236)=.0125+.0125=.025
Conclusion: Reject Ho in favor of Ha at 5%.
There is evidence of a difference.
New Product Reaction
NO YES
---------------
Market NO | 17 | 5 |
Leader |-----|-------|
Reaction YES | 15 | 3 |
---------------
Ho: Probability of a reaction is the same for the Market Leader and the New
Product.P-value: Using Chi-square tables with df=1, P(M>=5) lies between .025 and .05 (it is actually almost precisely .025).
Some comments:
Treatment 1
NO YES
---------------
Treat- NO | a | b |
ment |-----|-------|
2 YES | c | d |
---------------
or written as
Treatment 1
YES NO
---------------
Treat- YES | a | b |
ment |-----|-------|
2 NO | c | d |
---------------
and we let
Ho:p1=p2
Ha:p1 < > p2
Test Statistic: M=(b-c)^2/(b+c)
Critical Region: Reject Ho in favor of Ha at 5% if M >= 3.84
P-value: Using the right hand tail of the Chi-square tables with df=1, find where M lies.
In particular, note that we must write no & yes symmetrically in the rows and columns of the table (both no-yes, no-yes, or the other way around, both yes-no, yes-no). The test statistic M uses only b and c from the table, i.e. tied observations do not affect it, and it has the same value regardless of which way (indicated above) you write your data.
Give 500 subjects type G, observe 52 reactions or 10.4%
Give the SAME 500 subjects type BT, observe 68 reactions or 13.6%
Let p1 be the probability of a reaction using type G.
Let p2 be the probability of a reaction using type BT.
This is paired dichotomous data, and you cannot analyze it unless you know, in addition to the above, how many subjects react to both G and BT. But suppose you decide to IGNORE the pairing and analyze the data as two independent samples. This is WRONG, and the calculation below will show you how much!
Skin Reactions to Penicillin:
Type G
yes no
---------------
Type yes | 50 | 18 |
|------|------|
BT no | 2 | 430 |
---------------
Ho: p1 = p2Ignoring pairing can be very costly.
Return to 100A main menu