We now proceed to an obvious extension of the one-sample procedures: paired samples. This chapter is going to be much shorter as the procedures are very similar to the previously discussed one-sample tests.
Paired observations are pretty common in applications as each unit acts as its own control. For instance, before treatment vs. after, left side vs. right side, identical twin A vs. identical twin B, etc. Another way this comes about is via matching a common experimental design technique: the researcher matches each unit with another unit that shares relevant (or maybe irrelevant) characteristics - age, education, height, etc., and the two units differ on the treatment. Then, use (some of) our one-sample methods on the differences between the paired observation values. By analogy, consider the paired t-test, which looks very much like a one-sample t-test.
Suppose 12 sets of identical twins were given a psychological test to measure the amount of aggressiveness in each person’s personality. We’re interested in whether the firstborn tends to be more aggressive than the other. A higher score is indicative of higher aggressiveness. We hypothesize : the firstborn does not tend to be more aggressive (no difference is also okay), versus : the firstborn twin tends to be more aggressive.
Here’s the data we need for the signed rank test on the differences :
We use mid-ranks for the ties, then use the Sprent and Smeeton “device” for handling the deviation of 0. Our test statistic is:
Now we can get the normal approximation of the test statistic:
And the score representation:
For the score representation, we see a slight difference when we compare to: , and . The difference is caused by the “” rank in the case where . In either case, there’s really no evidence that the firstborn twin is more aggressive.
- Within a pair, observations may not be independent (obviously, as almost by construction sometimes they won’t be), but the pairs themselves should be.
- The typical here would be that the median of the differences is 0. If the differences are assumed to have a symmetric distribution (about 0), then the Wilcoxon approach is suitable. Note then that the individual unit measurements do not need to be assumed symmetric. You need to think about whether this is realistic for your given situation.
- The alternative therefore refers to a shift in centrality of the differences. You need to think about whether this is an interesting, relevant, or important question.
The test is a less obvious use of, or a modification to the sign test. Here’s example 5.4 in the book. We have records on all attempts of two rock climbs, successful or not. For the 108 who tried both:
First climb success
First climb failure
Second climb success
Second climb failure
Is there evidence that one climb is harder? The only ones that had information for this question are the ones that succeeded in one but failed in another. The people who succeeded / failed in both are essentially “ties”.
We can frame this as a Binomial setup. Think of success failure as a “+” for the first climb, and failure success as a “-” for the first climb. This puts us under a sign test situation. Under : climbs are equally difficult, and the probability of a “+” is the same as the probability of a “-” with a “+”. The p-value is 0.405, so there seems to be no difference in the difficulty of the climbs.
This version of the sign test is called McNemar’s test. In general, we have pairs of data where and take values 0 and 1 only. There are 4 patterns of outcomes: . The paired data are summarized in a contingency table showing the classifications of and :
- Pairs are mutually independent.
- Two categories for each outcome ().
- The difference is negative for all , 0 for all , or positive for all .
The null hypothesis can be formulated in various equivalent ways:
The first one, in terms of our example, is saying is equal to . The second one is saying .
if and are reasonably small, take . Under , , which is our usual sign test. Or for larger and :
Note that it does not depend on and . Under , is approximately . To derive , let’s call . If is large enough, we can use a normal approximation to the Binomial:
And that’s it! Paired-sample tests really aren’t that different from one-sample tests. Now we’re ready for some real problems.