We’ve been focusing on location inference for quite a while. There’s of course other inferences in the field, and what we often want are measures that summarize the strength of relationships between variables, namely the strength of association or dependence.

Recall that we have the “classical” Pearson correlation coefficient between two random variables and . It’s a measure of

**linear association**. Inference for (population parameter) based on*(sample value) has an assumption of***bivariate normality**, i.e. and are jointly normally distributed.Can we be more general and relax away the normality assumption? What about variables/measures that are

**not**continuous (e.g. counts) and therefore can’t be normal?**Monotonicity**asks do the two variables tend to increase together, or do tend to decrease as increases.In the parametric/bivariate normal/linearity context:

In the nonparametric/monotonicity settings, we’d like:

### Correlation in bivariate data

The key idea (again) is

**ranks**, and it requires a notion of ordering.**Exact tests**are based on simulation of the permutation type. A simple scheme would be two paired samples (measurements) with observations on each measurement. If we fix the order of one of the variables:V1 (ranks) | V2 |

1 | ? |

2 | ? |

? | |

? |

and look at all possible orderings of the ranks for the second variable - there are of them. Compute measure of correlation for each of these to build the empirical distribution.

#### Spearman rank correlation coefficient

A popular measure is the

**Spearman rank correlation coefficient**. It’s essentially**Pearson's correlation**calculated on the**ranks**instead of the raw data. Some notations:- - population value.

- - sample value.

- - paired observations, .

- - ranks assigned to values .

- - ranks assigned to values .

If there are no ties,

If for all

*i*, i.e. ranks on are equal to the ranks on , then and . If ranks are perfectly reversed:x | 1 | 2 | ⋯ | n |

y | n | n − 1 | ⋯ | 1 |

We have a perfect monotonically decreasing trend. Here for all . We can show that in this case . Intermediate cases (not perfect monotone decreasing/increasing) implies is somewhere between − 1 and + 1.

#### Kendall rank correlation coefficient

The other widely used measure is

**Kendall's tau (****)**, which is built on the**Mann-Whitney formulation**(hence also the**J-T test**for ordered alternatives). This is often used as a measure of agreement between judges (how well do two judges agree on their rankings).We first order the values of the first variable to get for all . If there’s a positive rank association, ranks for the second variable, , should also show an increasing trend; if there’s a negative rank association, should show a decreasing trend.

Order of the variable is fixed, . For the , count

**concordances**and**discordances**, which are pairs that**follow**the ordering and that**reverse**the ordering, respectively. That is, for and , count as a**concordance**() if and a**discordance**() if . Our test statistic isWe have scores on two exam questions for 12 students:

1 | 1 | 13 | 2 |

3 | 2 | 15 | 3 |

4 | 3 | 18 | 5 |

5 | 4 | 16 | 4 |

6 | 5 | 23 | 6 |

8 | 6 | 31 | 7 |

10 | 7 | 39 | 9 |

11 | 8 | 56 | 12 |

13 | 9 | 45 | 11 |

14 | 10 | 43 | 10 |

16 | 11 | 37 | 8 |

17 | 12 | 0 | 1 |

Scores 12/20 | ㅤ | Scores 12/60 | ㅤ |

For , we can get the number of concordant pairs by counting how many are bigger than 2. There are 10 concordant pairs and 1 discordant pair. Similarly, for there are 7 concordant pairs and 2 discordant pairs.

In total,

Note that we have . If all pairs are concordant, and ; if all pairs are discordant, and . If there is a mix of concordant and discordant pairs, will range between -1 and +1. In the example, .

#### Asymptotic results

The number of concordances is essentially the

**J-T statistic**. In large enough (opinion differs on how large is large), we can use an asymptotic normal distribution instead of the exact permutation-based distributions. There haven been numerous approaches suggested in the literature:**A few remarks:**

- These approximations are for : correlation is 0. Testing other values of potential interest is harder in general. A simulation based approach may help.

- Confidence intervals for the population correlation are also “tricky” using asymptotic results, because the limits may fall outside the interval.

- If the data are really from a bivariate normal distribution
**BVN**, both and have pretty high efficiency relative to the Pearson coefficient - 0.912 in both cases. The Kendall coefficient tends to be more powerful than Pearson when the data are from long-tailed, symmetric distributions.

### Ranked data for several variables

We can also consider ranked data for several variables, e.g. more than two judges, and see how well they agree with each other. We often want to test for evidence of

**concordance**between rankings of the units. Concordance, whether measured by Kendall’s statistic or the Friedman modification, will be one sided: if of no association is rejected, we’ll be in the direction of positive association.#### Kendall’s W

Kendall’s W is a normalization of the statistic of the Friedman test. Let’s consider the following example for concordance for multiple judges. Suppose four judges rank five items as follows:

A | B | C | D | Sum | |

I | 1 | 5 | 1 | 5 | 12 |

II | 2 | 4 | 2 | 4 | 12 |

III | 3 | 3 | 3 | 3 | 12 |

IV | 4 | 2 | 4 | 2 | 12 |

V | 5 | 1 | 5 | 1 | 12 |

Judges A and C have perfect concordance with each other. Same for judges B and D. However, (

*A*,*C*) and (*B*,*D*) have perfect discordance, and all items end up with the**same**total rank sum. A measure based directly on, say, the**Friedman test**(or anything that just considers the total rank sums) can’t detect a pattern like this.The test statistic is based on comparison of the rank sums for each unit, as in the Friedman test, but

**scaled**by the maximum value attainable (which happens when the judges are in perfect agreement with each other). This maximum value depends both on the number of items and the number of judges:where is the sum of squares of deviations of rank sums from mean under random allocation. The scaling allows us to calibrate and interpret the statistic in a meaningful way. takes value 1 when there is complete agreement (because by definition), and takes value 0 when there is no agreement

**or**contrary opinions are held by pairs of judges (as in example above). This is**Kendall's W**, also known as**Kendall’s coefficient of concordance**.If there are no ties, where is the number of judges and is the number of items being judged.

#### Another example

There are six contestants in a diving competition. Three judges each independently ranked their performances in order of merit (1 for best, 6 for worst):

A | B | C | Rank total | |

I | 2 | 2 | 4 | 8 |

II | 4 | 3 | 3 | 10 |

III | 1 | 1 | 2 | 4 |

IV | 6 | 5 | 5 | 16 |

V | 3 | 6 | 1 | 10 |

VI | 5 | 4 | 6 | 15 |

The average rank sum in this case is 10.5:

To interpret this value, we need to scale it by :

Which is a fair amount of agreement. If we want a p-value for , there are some tables for it. We can also use the

**Friedman test**for p-value calibration. We can show thatwhere is the Friedman test statistic. In this example, the p-value is 0.062 for : no concordance, so we may conclude there is agreement though we’re unsure how solid it is.

Another possibility for an overall measure of agreement is to look at

**all pairwise**measures of correlation/agreement. With judges, there are pairs. With Spearman’s rank correlation coefficient for instance, if we take the mean of the Spearman rank correlations across the pairs,To interpret / calibrate this, we can think of the extreme cases. If all pairwise rankings are in perfect agreement, the pairwise are all 1, which makes and thus . When (no agreement / discordance among pairs), , which can be shown to be the least possible value of .