# Multivariate Probability Distributions

Tags
Statistics
Distribution
Date
Nov 8, 2019
Description
Joint probability distributions of two or more random variables defined on the same sample space. Also covers independence, conditional expectation and total expectation.
Slug
mathematical-statistics-multivariate-probability-distributions
In the previous two chapters, we focused on studying a single random variable. However, we can define more than one random variable on the same sample space.
Let’s consider the following motivating example. Suppose we roll two six-sided fair dice. The sample space contains 36 possible sample points. Define
Sometimes, we may want to assign a probability to an event that involves two or more random variables. For example,
This is called the joint probability of and .

### Two discrete random variables

Let and be two discrete random variables defined on the same sample space. The joint probability mass function for and is given by
The joint probability mass function has the following properties:
1. for all and .
1. .
Given the joint PMF, we can also define the joint cumulative distribution function (also called the joint probability distribution function) as
To distinguish with the joint probability mass function and the joint probability distribution function, we call the mass and distribution functions of a single random variable as marginal mass and distribution functions.
Suppose we have two discrete random variables and
The marginal probability mass function of can be obtained by summing the joint probability mass function over possible values of . The marginal distribution function can be derived from the joint distribution function:
Suppose we have a box that contains 3 red balls and 4 blue balls. Let
If we randomly draw 3 balls out of the 7 balls, find the joint probability mass function .
We can list out all possible configurations of this random experiment:
 0 1 2 3 0 0 0 0 0 1 0 0 0 0 2 0 0 0 0 3 0 0 0 0
The blanks in the table are the possible events. We have , so the possible outcomes are
We can find their probabilities
Similarly we can find and . We can also find the marginal probability of at, say :

### Two continuous random variables

Let and be two continuous random variables. We say and are jointly continuous if there exists a function such that
The function is called the joint probability density function of and . We can decompose the event
So we can also rewrite the joint density function into
The joint cumulative distribution function can be defined using integrals of the joint density functions:
Also, we can find the relationship between the joint and marginal density functions:
Now, suppose the joint density function of continuous random variables and is given by
and we want to compute
1. , and
1. .
For case 1,
For case 2,
And for case 3,

### Conditional probability distribution

If we have events and on a sample space , the
Conditional Probability
of given is defined as
We can easily extend this idea to two random variables. Suppose we now have random variables and , and we define events and , then

#### The discrete case

If and are two discrete random variables, we can define the conditional probability mass function of given as
Similarly, the conditional distribution function of given can be defined as
We can also check if the conditional mass function satisfies the properties of a probability mass function:
As an example, suppose that is given by
We want to calculate the conditional probability mass function of given that .
Using the notations above,
We have
and the marginal probability function of
The conditional PMF is given by

#### The continuous case

If and are two continuous random variables, we can define the conditional joint density function of given , for all values of such that , by
The conditional distribution function of given can be calculated as
Again we’ll explain through an example. Suppose the joint density function of continuous random variables and is given by
We want to compute the conditional density of given when .

### Independent random variables

The concept of independence between events can also be extended to that between two random variables. If and are two random variables, we say and are independent with each other, i.e. , if for all and , the events and are independent with each other:
If we take and for a pair of real numbers and
Hence, in terms of the joint distribution function of and
If and are discrete random variables and , the condition of independence is equivalent to
In the continuous case
So loosely speaking, and are independent if knowing the value of one doesn’t change the distribution of another.
As an example, suppose a couple decided to date at a restaurant for dinner. Suppose each of them independently arrives at a time uniformly distributed between 6 p.m. and 7 p.m. Find the probability that the first to arrive has to wait longer than 10 minutes.
Let and denote the number of minutes past 6 that the man and woman arrives, respectively. We have
and . Define events
What we want to find is , that is the man () or the woman () waits for more than 10 minutes. Since and are disjoint, . By symmetry,
We may express as
To find its probability,
where the equality holds because of independence.
So the final answer is .

### Expectation

Recall that for a random variable X, its expectation is given by
and both can be considered as special cases of :
If and are two discrete random variables, we can extend the definition using the joint probability mass function:
and similarly using the joint probability density function for the continuous case:
Suppose and have the joint density function
The expectation is given by

#### Properties

1. Similar to the property of the expectation of a single random variable,
1. If , then .
1. If and
If we can decompose the joint density function , we can show that it’s the production of the two marginal probability functions. A special case of (3) is .
These properties could make a lot calculations much easier. Suppose X and Y are two standard normal random variables and are independent with each other and we want to calculate . With the properties shown above, we no longer need to find .

### Covariance

As shown in the example above, we have defined the expected value and variance for a single random variable. For two or more random variables, we introduce a new quantity to give us the information about the relationship between the random variables.
The covariance of and , denoted as , is defined as
Upon expanding the right hand side of the definition above,

#### Properties

1. The order doesn’t matter:
1. The relationship between the covariance and the variance of a single random variable:
1. Similar to the property of the variance,
1. For and
1. For independent random variables,
1. Note that the arrow doesn’t go both ways! Having zero covariance (uncorrelated) doesn’t imply independence. An example is and .
As an example, suppose X and Y are two continuous random variables with joint density function
and we want to find .
We can view the joint density function as , and , . To find , we need to find , ] and .
To calculate , we first need to find the marginal probability function of .
And similarly for
Now we can show the covariance between and to be

### Conditional expectation

Recall that we have the conditional probability function defined as
In the discrete case, we can define the conditional expectation of given for all values of such that by

#### Discrete example

Suppose and are independent binomial random variables with parameters and . Find the conditional expected value of given that .
Since and , if , the values can take would be limited to . For
Here because and are independent, and we can consider as independent trails, each with a success probability . Now we can plug the term back into the conditional expectation equation:
Note that another way of writing out the probability mass function of is
Now we need to rewrite the binomial terms in the equation to remove :
Plugging these terms back in,

#### Continuous example

and are continuous random variables with joint density function
Find where .
Note that in the equation of the conditional density
the joint density function is given, so we only need to figure out the marginal density function of .
So the conditional density function is
which follows an exponential density function with parameter . Finally we can find the conditional expectation
because for , and .

#### Properties

1. Summation comes outside:
1. .
1. Combining 1 and 2,
1. Generalizing 2,

#### Making predictions

A very common problem in statistics is to make predictions. Suppose the value of a random variable is observed, and an attempt is made to predict the value of another random variable based on this observed value of .
Let denote the predictor. Clearly, we would want a function such that tends to be close to . A popular criterion for closeness is the error sum of squares. The model is given by
We can think of this as where is the estimation error. The equation above is minimizing the variance of . We can show that under this criterion, the best possible predictor of is .
Theorem
.
Now suppose the height of a man is inches and the height of his son is inches。 Assume . If we know a father is 6 feet tall, what is the expected value of the height of his son?
The model can be written as
where and is independent of . The best prediction of given is .

### Total expectation

For any value of , we can think of the conditional expectation as a real-valued function of
and can be considered as another random variable.
The equation is called total expectation. If and are two discrete random variables
In the continuous case
The proof for the discrete case is given below. The proof for the continuous case is similar.
Suppose Joe is in a room with doors A, B and C. Door A leads outside and it takes 3 hours to go through the tunnel behind it. Doors B and C both lead back to the room, and the tunnels take 5 and 7 hours to finish, respectively. The three doors are shuffled each time one door is chosen, so a random guess is needed each time. What is the expected value for the number of hours to leave the room?
Let be the number of hours to escape. The possible values of goes from 3 to infinity (if Joe’s really unlucky). Directly calculating is therefore not straightforward, so we define
and calculate using total expectation.
which gives us . To understand the conditions, we explain as an example. If Joe chose the second door, he would spend 5 hours in the tunnel and then return to the room. Since the doors are shuffled, the problem is the same as before. Thus, his expected additional time until leaving the room is just , and .