Common Distributions of Discrete Random Variables

Common Distributions of Discrete Random Variables

Oct 7, 2019
We introduce the binomial (Bernoulli), geometric and Poisson probability distributions and their properties. The properties include their expectations, variances and moment generating functions.
In this section, we present some specific types of discrete random variables and derive their probability distributions, expectations and variances.

The binomial probability distribution

Suppose a random experiment has a binary outcome (1 or 0, success or failure, etc.). Let be a random variable that indicates the result of this random experiment. The PMF of can be written as
where . Then we say , meaning is a Bernoulli random variable with parameter , or is drawn from a Bernoulli distribution with parameter .
A binomial experiment is a random experiment that contains independent and identical Bernoulli experiments, e.g. tossing a coin times. Denote as the number of successes observed in the trials. Then where is the probability of success in each trial. can take any integer value from . The PMF is
Binomial distribution with  and different  parameters.
Binomial distribution with and different parameters.
Above is a visualization of the binomial distribution with different values for p. We can also see that the Bernoulli distribution is a special case of the binomial distribution where .
R code for animation.
library(tidyverse) library(ggpubr) library(gganimate) dat <- data.frame( x = rep(0:10, times = 5), Param = rep(c(0.1, 0.3, 0.5, 0.7, 0.9), each = 11) ) %>% mutate( freq = map2(x, Param, ~ dbinom(.x, 10, .y)), x = factor(x) ) ggbarplot(dat, "x", "freq") + transition_states(Param) + ease_aes('cubic-in-out') + ggtitle("p = {closest_state}") + labs(x = "X ~ Bin(10, {closest_state})", y = "Freq")


We first want to check that since it’s a distribution function. Then we would derive the expectataion and variance of the two distributions. Doing so for the Bernoulli distribution is straightforward:
For the binomial distribution, we know that
By using the binomial expansion, we have
The expected value of a binomial random variable is
The binomial expansion is given by
Lemma: Suppose we have and . We can show that
if this equation holds, we can get by setting .
Let , we have
For , we just need to figure out . Setting for the Lemma and we have
because . So the variance of is
The binomial distribution has many applications, such as modeling defectives in quality control, or anything else that can be put into a success-failure setting.

The geometric probability distribution

Suppose we have a Binomial experiment which consists of some independent and identical Bernoulli experiments with probability of success . We can define as a random variable to describe the number of trials until the first success. For example, if we’re tossing a coin for times, will be the number of the toss on which a head first appears.
Y is said to have a geometric probability distribution if and only if
where and .


As required for any valid discrete probability distribution, the probabilities should add up to 1.
We know the geometric sequence
For the expectation of , we have
Rearranging terms would give us
With the expectation of known, it’s easy to get the variance of once we find :
The variance is thus
The geometric probability distribution is often used to model the distributions of lengths of waiting times. For example, the probability of engine malfunction during any randomly observed time intervals is , and the length of time until the first malfunction can be modeled using the geometric distribution.

The Poisson probability distribution

The Poisson random variable has a range of infinite size. It takes values in the set . We’ll learn about this random variable through an example.
Suppose we’re performing quality control for a mobile phone company. Each phone made has a small chance to be defect. The average number of defect phones produced per day is . Find the probability of producing defect phones on a usual day.
First, we may assume that the production of each phone is a random variable of two outcomes: 0 or 1, or where is the phone with probability to be defect. In addition, we assume that there are phones produced, and the production of the phones are independent and share the same defect probability . Define
Then the probability of producing defect phones can be calculated using the probability mass function of at :
Our problem now is we don’t know and explicitly. But what we do know is on average there are defect phones produced per day, which gives
Now we can replace the variable in the equation with , and write out the probability mass function as
Though we don’t know exactly, it’s reasonable to assume that it’s a very large number, so we can let to study it in an asymptotic way.
where the limit for is found using the following limits
So as
Formally speaking, for a discrete random variable whose probability mass function satisfies
we say X is a Poisson random variable with parameter , or . The Poisson distribution provides a good model for the probability distribution of rare events that occur in space, time or any other dimension where λ is the average value.
Poisson distribution with different  values.
Poisson distribution with different values.


As always, we first check if the total probability is 1.
where we’ve used the Taylor series
Next, we prove that the expectation of is .
The variance is also . This doesn’t happen often as the expectation and the variance have different units, but it’s not a problem in the Poisson distribution as it is used to model counts, which is unitless.

The negative binomial distribution

Similar to the geometric distribution, suppose we have a sequence of i.i.d. Bernoulli trials with the same probability of success . We’re interested in the number of the trial on which the success occurs ().
Let and be fixed values, and consider events : {the first trials contain successes} and : {trial results in a success}. We’ve assumed that and that and B are independent, so
Using results from the binomial distribution, we can easily find
A random variable Y is said to have a negative binomial probability distribution if and only if
where , and we can denote it as . Here is the random number of failures. It’s name originates from the fact that
In the field of bioinformatics, the NB distribution is very frequently used to model RNA-Seq data. Simply put, in an RNA-Seq experiment we map the sequencing reads to a reference genome and count the number of reads within each gene. There tends to be millions of reads in total, but the number on each gene is usually within the thousands with great variability.
The Poisson distribution was used to model this in the beginning, but it has the assumption that the mean and variance are the same, which is not the case in RNA-Seq. The variance in the counts is in general much greater than the mean, especially with the highly-expressed genes. The negative binomial distribution’s other formulation - the gamma-Poisson mixture distribution has a dispersion parameter that fits here.


First we want to show that the probabilities add up to 1. We first play with the binomial part:
By constructing the function
we can decompose this into
When , the equation is reduced to the sum of probabilities of a geometric distribution
So we can show that
If we set , and , we have
We can now use this property to calculate the expectation.
Let and , we have

Moments and moment generating functions

In the previous sections, we’ve shown the expected values and variances for multiple random variables. In the calculations, we often have to calculate the expected values of some power functions of the random variable, such as for the variances.
In general, it would be of interest to calculate for some positive integer . This expectation is called the moment of .

Moment generating function

The moment generating function can be used to systematically calculate the moments of a random variable. For a random variable , its moment generating function is defined as
where is a parameter. Note that is not random! We call the moment generating function of because all the moments of can be obtained by successively differentiating and then evaluating the result at . For example, consider the first order derivative of .
and at , we have . Similarly, if we take the second order derivative of
In general, we can summarize the derivative of as
Then we evaluate this derivative at , which yields
So for a given random variable, if we know its moment generating function, we can take advantage of this property to calculate all the moments of this random variable. The correspondence between a distribution and its MGF is one-to-one. MGF is an ID for different distributions.

Binomial distribution

Find the MGF for .
Taking the first derivative, we have
Similarly we can get the variance by taking the second derivative:

Poisson distribution

Find and for .
using , we have
Taking the first derivative of the MGF yields
And the second derivative:
The variance can be found by

And that’s pretty much it for discrete random variables. Obviously we haven’t covered everything, but this should be good enough for now. Next, we’re going to talk about .