In the previous two chapters, we focused on studying a single random variable. However, we can define more than one random variable on the same sample space.

Letβs consider the following motivating example. Suppose we roll two six-sided fair dice. The sample space contains 36 possible sample points. Define

Sometimes, we may want to assign a probability to an event that involves two or more random variables. For example,

This is called the

**joint probability**of and .### Two discrete random variables

Let and be two discrete random variables defined on the same sample space. The

**joint probability mass function**for and is given byThe joint probability mass function has the following properties:

- for all and .

- .

Given the joint PMF, we can also define the

**joint cumulative distribution function**(also called the joint probability distribution function) asTo distinguish with the joint probability mass function and the joint probability distribution function, we call the mass and distribution functions of

*a single random variable as***marginal**mass and distribution functions.Suppose we have two discrete random variables and

The

**marginal probability mass function**of can be obtained by summing the joint probability mass function over possible values of . The**marginal distribution function**can be derived from the joint distribution function:Suppose we have a box that contains 3 red balls and 4 blue balls. Let

If we randomly draw 3 balls out of the 7 balls, find the joint probability mass function .

We can list out all possible configurations of this random experiment:

0 | 1 | 2 | 3 | |

0 | 0 | 0 | 0 | 0 |

1 | 0 | 0 | 0 | 0 |

2 | 0 | 0 | 0 | 0 |

3 | 0 | 0 | 0 | 0 |

The blanks in the table are the possible events. We have , so the possible outcomes are

We can find their probabilities

Similarly we can find and . We can also find the marginal probability of at, say :

### Two continuous random variables

Let and be two continuous random variables. We say and are

**jointly continuous**if there exists a function such thatThe function is called the

**joint probability density function**of and . We can decompose the eventSo we can also rewrite the joint density function into

The

**joint cumulative distribution function**can be defined using integrals of the joint density functions:Also, we can find the relationship between the joint and

**marginal density functions**:Now, suppose the joint density function of continuous random variables and is given by

and we want to compute

- , and

- .

For case 1,

For case 2,

And for case 3,

### Conditional probability distribution

We can easily extend this idea to two random variables. Suppose we now have random variables and , and we define events and , then

#### The discrete case

If and are two discrete random variables, we can define the

**conditional probability mass function**of given asSimilarly, the

**conditional distribution function**of given can be defined asWe can also check if the conditional mass function satisfies the properties of a probability mass function:

As an example, suppose that is given by

We want to calculate the conditional probability mass function of given that .

Using the notations above,

We have

and the marginal probability function of

The conditional PMF is given by

#### The continuous case

If and are two continuous random variables, we can define the

**conditional joint density function**of given , for all values of such that , byThe

**conditional distribution function**of given can be calculated asAgain weβll explain through an example. Suppose the joint density function of continuous random variables and is given by

We want to compute the conditional density of given when .

### Independent random variables

The concept of independence between events can also be extended to that between two random variables. If and are two random variables, we say and are

**independent**with each other, i.e.Β , if for all and , the events and are independent with each other:If we take and for a pair of real numbers and

Hence, in terms of the joint distribution function of and

If and are

**discrete random variables**and , the condition of independence is equivalent toIn the

**continuous case**So loosely speaking, and are independent if knowing the value of one doesnβt change the distribution of another.

As an example, suppose a couple decided to date at a restaurant for dinner. Suppose each of them independently arrives at a time uniformly distributed between 6 p.m. and 7 p.m. Find the probability that the first to arrive has to wait longer than 10 minutes.

Let and denote the number of minutes past 6 that the man and woman arrives, respectively. We have

and . Define events

What we want to find is , that is the man () or the woman () waits for more than 10 minutes. Since and are disjoint, . By symmetry,

We may express as

To find its probability,

where the equality holds because of independence.

So the final answer is .

### Expectation

Recall that for a random variable

*X*, its expectation is given byand both can be considered as special cases of :

If and are two

**discrete random variables**, we can extend the definition using the joint probability mass function:and similarly using the joint probability density function for

**the continuous case**:Suppose and have the joint density function

The expectation is given by

#### Properties

- Similar to the property of the expectation of a single random variable,

- If , then .

- If and

If we can decompose the joint density function , we can show that itβs the production of the two marginal probability functions. A special case of (3) is .

These properties could make a lot calculations much easier. Suppose

*X*and*Y*are two standard normal random variables and are independent with each other and we want to calculate . With the properties shown above, we no longer need to find .### Covariance

As shown in the example above, we have defined the expected value and variance for a single random variable. For two or more random variables, we introduce a new quantity to give us the information about the relationship between the random variables.

The

**covariance**of and , denoted as , is defined asUpon expanding the right hand side of the definition above,

#### Properties

- The order doesnβt matter:

- The relationship between the covariance and the variance of a single random variable:

- Similar to the property of the variance,

- For and

- For independent random variables,

Note that the arrow doesnβt go both ways! Having zero covariance (

**uncorrelated**) doesnβt imply independence. An example is and .As an example, suppose

*X*and*Y*are two continuous random variables with joint density functionand we want to find .

We can view the joint density function as , and , . To find , we need to find , ] and .

To calculate , we first need to find the marginal probability function of .

And similarly for

Now we can show the covariance between and to be

### Conditional expectation

Recall that we have the conditional probability function defined as

In the discrete case, we can define the

**conditional expectation**of given for all values of such that byIn the continuous case

#### Discrete example

Suppose and are independent binomial random variables with parameters and . Find the conditional expected value of given that .

Since and , if , the values can take would be limited to . For

Here because and are independent, and we can consider as independent trails, each with a success probability . Now we can plug the term back into the conditional expectation equation:

Note that another way of writing out the probability mass function of is

Now we need to rewrite the binomial terms in the equation to remove :

Plugging these terms back in,

#### Continuous example

and are continuous random variables with joint density function

Find where .

Note that in the equation of the conditional density

the joint density function is given, so we only need to figure out the marginal density function of .

So the conditional density function is

which follows an exponential density function with parameter . Finally we can find the conditional expectation

because for , and .

#### Properties

- Summation comes outside:

- .

- Combining 1 and 2,

- Generalizing 2,

#### Making predictions

A very common problem in statistics is to make predictions. Suppose the value of a random variable is observed, and an attempt is made to predict the value of another random variable based on this observed value of .

Let denote the predictor. Clearly, we would want a function such that tends to be close to . A popular criterion for closeness is the error sum of squares. The model is given by

We can think of this as where is the estimation error. The equation above is minimizing the variance of . We can show that under this criterion, the best possible predictor of is .

**Theorem**

.

Now suppose the height of a man is inches and the height of his son is inchesγ Assume . If we know a father is 6 feet tall, what is the expected value of the height of his son?

The model can be written as

where and is independent of . The best prediction of given is .

### Total expectation

For any value of , we can think of the conditional expectation as a real-valued function of

and can be considered as another random variable.

The equation is called

**total expectation**. If and are two**discrete random variables**In the

**continuous case**The proof for the discrete case is given below. The proof for the continuous case is similar.

Suppose Joe is in a room with doors A, B and C. Door A leads outside and it takes 3 hours to go through the tunnel behind it. Doors B and C both lead back to the room, and the tunnels take 5 and 7 hours to finish, respectively. The three doors are shuffled each time one door is chosen, so a random guess is needed each time. What is the expected value for the number of hours to leave the room?

Let be the number of hours to escape. The possible values of goes from 3 to infinity (if Joeβs really unlucky). Directly calculating is therefore not straightforward, so we define

and calculate using total expectation.

which gives us . To understand the conditions, we explain as an example. If Joe chose the second door, he would spend 5 hours in the tunnel and then return to the room. Since the doors are shuffled, the problem is the same as before. Thus, his expected additional time until leaving the room is just , and .