We now proceed to an obvious extension of the one-sample procedures: paired samples. This chapter is going to be much shorter as the procedures are very similar to the previously discussed one-sample tests.

Paired observations are pretty common in applications as each unit acts as its own control. For instance, before treatment vs. after, left side vs. right side, identical twin A vs. identical twin B, etc. Another way this comes about is via

**matching****a common experimental design technique: the researcher matches each unit with another unit that shares relevant (or maybe irrelevant) characteristics - age, education, height, etc., and the two units differ on the treatment. Then, use (some of) our one-sample methods on the****differences**between the paired observation values. By analogy, consider the paired t-test, which looks very much like a one-sample t-test.### Wilcoxon signed rank test

Suppose 12 sets of identical twins were given a psychological test to measure the amount of aggressiveness in each person’s personality. We’re interested in whether the firstborn tends to be more aggressive than the other. A higher score is indicative of higher aggressiveness. We hypothesize : the firstborn does not tend to be more aggressive (no difference is also okay), versus : the firstborn twin tends to be more aggressive.

Here’s the data we need for the signed rank test on the differences :

Firstborn () | Secondborn () | Difference () | Rank | Signed rank |

86 | 88 | 2 | 4 | 4 |

71 | 77 | 6 | 8 | 8 |

77 | 76 | -1 | 2.5 | -2.5 |

68 | 64 | -4 | 5 | -5 |

91 | 96 | 5 | 6.5 | 6.5 |

72 | 72 | 0 | (1) | − |

77 | 65 | -12 | 11 | -11 |

91 | 90 | -1 | 2.5 | -2.5 |

70 | 65 | -5 | 6.5 | -6.5 |

71 | 80 | 9 | 10 | 10 |

88 | 81 | -7 | 9 | -9 |

87 | 72 | -15 | 12 | -12 |

We use mid-ranks for the ties, then use the Sprent and Smeeton “device” for handling the deviation of 0. Our test statistic is:

Now we can get the normal approximation of the test statistic:

And the score representation:

For the score representation, we see a slight difference when we compare to: , and . The difference is caused by the “” rank in the case where . In either case, there’s really no evidence that the firstborn twin is more aggressive.

#### Remarks

- Within a pair, observations may not be independent (obviously, as almost by construction sometimes they won’t be), but the
**pairs**themselves should be.

- The typical here would be that the
**median of the differences**is 0. If the differences are assumed to have a**symmetric distribution**(about 0), then the Wilcoxon approach is suitable. Note then that the individual unit measurements**do not**need to be assumed symmetric. You need to think about whether this is realistic for your given situation.

- The alternative therefore refers to a
**shift in centrality**of the differences. You need to think about whether this is an interesting, relevant, or important question.

### McNemar’s test

The test is a less obvious use of, or a modification to the sign test. Here’s example 5.4 in the book. We have records on all attempts of two rock climbs, successful or not. For the 108 who tried both:

First climb success | First climb failure | |

Second climb success | 73 | 14* |

Second climb failure | 9* | 12 |

Is there evidence that one climb is harder? The only ones that had information

**for this question**are the ones that succeeded in one but failed in another. The people who succeeded / failed in both are essentially “ties”.We can frame this as a Binomial setup. Think of success failure as a “+” for the first climb, and failure success as a “-” for the first climb. This puts us under a sign test situation. Under : climbs are equally difficult, and the probability of a “+” is the same as the probability of a “-” with a “+”. The p-value is 0.405, so there seems to be no difference in the difficulty of the climbs.

This version of the sign test is called

**McNemar’s test**. In general, we have pairs of data where and take values 0 and 1 only. There are 4 patterns of outcomes: . The paired data are summarized in a**contingency table**showing the classifications of and :#### Assumptions

- Pairs are mutually independent.

- Two categories for each outcome ().

- The difference is negative for all , 0 for all , or positive for all .

The null hypothesis can be formulated in various equivalent ways:

or

or

The first one, in terms of our example, is saying is equal to . The second one is saying .

#### Test statistic

if and are reasonably small, take . Under , , which is our usual sign test. Or for larger and :

Note that it does

**not**depend on and . Under , is approximately . To derive , let’s call . If is large enough, we can use a normal approximation to the Binomial:And that’s it! Paired-sample tests really aren’t that different from one-sample tests. Now we’re ready for some real problems.