Probability and the Normal Distribution

Dr. Gilbert

September 26, 2024

The Highlights

  • The Normal Distribution
    • Examples of Normal Distributions
    • Center (mean: \(\mu\))
    • Spread (standard deviation: \(\sigma\))
  • Probabilities and the Normal Distribution (areas)
    • Visual Representations of Probabilities
    • Normal Distribution is Symmetric
    • Why the Probability of an Exact, Singular Outcome is 0
  • Examples: Normal, Binomial, or Neither
  • Review of R functionality for probabilities and percentiles/quantiles with the Normal Distribution
    • pnorm()
    • qnorm()
  • Examples: Working with the Normal Distribution

The Normal Distribution

A normal distribution is a bell-shaped distribution which is parameterized (determined) by a mean \(\mu\) and standard deviation \(\sigma\)

The normal distribution with mean \(\mu\) and standard deviation \(\sigma\) is denoted by \(N\left(\mu,~\sigma\right)\) and has probability density function

\[p\left(x\right) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{\left(x - \mu\right)^2}{2\sigma^2}}\]

Luckily, like the binomial distribution, we don’t evaluate or work with this distribution by hand either!

Examples of Normal Distributions

The black distribution has \(\mu = 0\) and \(\sigma = 1\)

The purple distribution has \(\mu = 1\) and \(\sigma = 0.3\)

The green distribution has \(\mu = -1\) and \(\sigma = 2\)

The black distribution, \(N\left(\mu = 0,~\sigma = 1\right)\), is a special distribution called the standard normal distribution and we often use \(z\) to denote its “support” values.

We’ll encounter this distribution quite often throughout our course.

The Center of the Normal Distribution

All of the distributions below have standard deviation \(\sigma = 2\) but they have different means (\(\mu\))

  • The mean of the purple distribution is \(\mu = 0\)
  • The mean of the green distribution is \(\mu = 2\)
  • The mean of the orange distribution is \(\mu = -7\)

Note: All three distributions have the same shape, but are shifted so that their peak is at their mean.

The Spread of the Normal Distribution

All of the distributions below have mean \(\mu = 0\) but they have different standard deviations (\(\sigma\))

  • The standard deviation of the purple distribution is \(\sigma = 1.5\)
  • The standard deviation of the green distribution is \(\sigma = 3\)
  • The standard deviation of the orange distribution is \(\sigma = 5\)

Note: All of these distributions have the same center, but their width (and height too) change depending on their standard deviation.

Probabilities and the Normal Distribution

Probabilities of outcomes from random variables described by a normal distribution are areas under the corresponding normal curve.

The picture to the right shows \(\mathbb{P}\left[X \geq k\right]\)

Note: It is also worth knowing that the Normal Distribution is symmetric



is the same as…

Note: If \(\mu = 0\), then this relationship simplifies to \(\mathbb{P}\left[X < -k\right] = \mathbb{P}\left[X > k\right]\) – but be careful, this is only true when \(\mu = 0\).

Probability of an Exact, Singular Outcome is 0

Question: What is the probability that a randomly selected adult male from the United States is exactly 72 inches (6 feet) tall?

Since probabilities are areas under the curve and we can approximate this region by a rectangle, we can use the formula for the area of a rectangle to approximate this probability! What’s the area of a rectangle with 0 width?

Note: For this reason, \(\mathbb{P}\left[X \leq k\right] = \mathbb{P}\left[X < k\right]\) and \(\mathbb{P}\left[X \geq k\right] = \mathbb{P}\left[X > k\right]\).

If we want to estimate probabilities, we’ll need to ask questions about ranges of values.

  1. What is the probability that an adult male in the US is less than 72 inches tall?
  2. …at least 72 inches tall?
  3. …between 71.5 and 72.5 inches tall?

Normal, Binomial, or Neither

Determine whether each of the following scenarios involve a random variable that can be modeled by a normal distribution or not.

Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.

Normal, Binomial, or Neither

Determine whether each of the following scenarios involve a random variable that can be modeled by a normal distribution or not.

Scenario: You roll a fair six-sided die repeatedly until a six appears, and you want to know how many rolls it takes.

Normal, Binomial, or Neither

Determine whether each of the following scenarios involve a random variable that can be modeled by a normal distribution or not.

Scenario: A factory has a 2% defect rate. Each day, 200 items are produced, and the number of defective items is counted.

Normal, Binomial, or Neither

Determine whether each of the following scenarios involve a random variable that can be modeled by a normal distribution or not.

Scenario: The weight of apples grown in an orchard is approximately normally distributed with a mean of 150 grams and a standard deviation of 20 grams.

Normal, Binomial, or Neither

Determine whether each of the following scenarios involve a random variable that can be modeled by a normal distribution or not.

Scenario: A region typically experiences 5 minor earthquakes per year. You are interested in the number of earthquakes in the next year.

Normal, Binomial, or Neither

Determine whether each of the following scenarios involve a random variable that can be modeled by a normal distribution or not.

Scenario: The number of cars passing through a toll booth in a 10-minute period is recorded. On average, 50 cars pass through every 10 minutes.

Normal, Binomial, or Neither

Determine whether each of the following scenarios involve a random variable that can be modeled by a normal distribution or not.

Scenario: The lifespan of a certain smartphone battery is approximately normally distributed with a mean of 18 months and a standard deviation of 3 months.

Normal, Binomial, or Neither

Determine whether each of the following scenarios involve a random variable that can be modeled by a normal distribution or not.

Scenario: A teacher gives a test to 30 students and knows that 80% of the students usually pass. The teacher is interested in the number of students who will pass this time.

Normal, Binomial, or Neither

Determine whether each of the following scenarios involve a random variable that can be modeled by a normal distribution or not.

Scenario: The time until a particular brand of light bulb fails is recorded. The failure times do not follow a normal pattern but tend to follow a long-tailed distribution.

Normal, Binomial, or Neither

Determine whether each of the following scenarios involve a random variable that can be modeled by a normal distribution or not.

Scenario: In a survey, you ask 200 randomly selected people whether they plan to vote in an upcoming election. Historically, 55% of people vote.

Normal, Binomial, or Neither

Determine whether each of the following scenarios involve a random variable that can be modeled by a normal distribution or not.

Scenario: The daily temperature in a city during the summer months is approximately normally distributed with a mean of 85°F and a standard deviation of 5°F.

Normal, Binomial, or Neither

Determine whether each of the following scenarios involve a random variable that can be modeled by a normal distribution or not.

Scenario: A social media influencer posts a new photo, and the number of likes received over the next 24 hours is recorded. The number of likes does not seem to follow any regular or predictable pattern.

Normal, Binomial, or Neither?

Determine whether each of the following scenarios involve a random variable that can be modeled by a normal distribution or not.

Scenario: The distance run by professional athletes in a 30-minute endurance test is normally distributed with a mean of 5 kilometers and a standard deviation of 0.3 kilometers.

Review of R Functionality for the Normal Distribution (Finding Probabilities; pnorm())

Assume that \(X\sim N\left(\mu = \text{mu}, \sigma = \text{sigma}\right)\).

That is, the random variable \(X\) is normally distributed with mean \(\text{mu}\) and standard deviation \(\text{sigma}\).

Since \(\mathbb{P}\left[X = k\right] = 0\) (that is, the probability that the random variable \(X\) takes on the value \(k\) exactly, is 0), we will not make use of any dnorm() function

The probability that \(X\) takes on a value less than or equal to (or just less than) \(k\) is \(\mathbb{P}\left[X \leq k\right] \approx\) pnorm(k, mean = mu, sd = sigma)

In cases where we want to find \(\mathbb{P}\left[x \geq k\right]\), \(\mathbb{P}\left[k_1 \leq X\leq k_2\right]\), and others, we’ll need to…

Strategy: Draw a picture and let your picture tell you how to use pnorm()

Review of R Functionality for the Normal Distribution (Finding Percentiles; qnorm())

Assume that \(X\sim N\left(\mu = \text{mu}, \sigma = \text{sigma}\right)\).

That is, the random variable \(X\) is normally distributed with mean \(\text{mu}\) and standard deviation \(\text{sigma}\).

In addition to computing probabilities, we can compute percentiles/quantiles (cut-off or boundary values)

The boundary value \(k^*\) such that \(\mathbb{P}\left[X \leq k^*\right] = p\) is given by
qnorm(p, mean = mu, sd = sigma)

Note: \(p\) must be the area to the LEFT of the boundary value your are looking for

Before We Try, Some Advice

  1. Don’t “memorize” anything beyond…
    • the normal distribution is bell-shaped and centered at its mean
    • use pnorm() if you have a boundary value(s) and need to find a probability
    • use qnorm() if you have a probability/area and need to find a boundary value
  2. Draw a picture!
  3. Estimate your answer before computing it
    • If it is a probability, is it more or less than 0.5?
    • If it is a boundary value, should it be more than the average or less?
  4. Use your picture to guide your strategy for finding the answer
  5. Check your answer against your estimate/expectation

Examples: Working with the Normal Distribution

Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.

  • What is the probability that a randomly selected runner finishes the marathon in less than 4 hours?

Examples: Working with the Normal Distribution

Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.

  • What is the probability that a randomly selected runner finishes the marathon in less than 4 hours?

Examples: Working with the Normal Distribution

Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.

  • What is the probability that a randomly selected runner finishes the marathon in less than 4 hours?

How big should our answer be?

Examples: Working with the Normal Distribution

Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.

  • What is the probability that a randomly selected runner finishes the marathon in less than 4 hours?

We’re looking for a probability and it should be less than 0.5

Since we have our boundary value and we are looking for a probability, let’s use pnorm()

pnorm(4, 4.5, 0.75)
[1] 0.2524925

Answer: There is about a 25.25% chance that a randomly selected marathon runner finished the marathon in under 4 hours.

Examples: Working with the Normal Distribution

Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.

  • What is the probability that a randomly selected runner finishes the marathon in 3 and a half hours or more?

Examples: Working with the Normal Distribution

Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.

  • What is the probability that a randomly selected runner finishes the marathon in 3 and a half hours or more?

Examples: Working with the Normal Distribution

Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.

  • What is the probability that a randomly selected runner finishes the marathon in 3 and a half hours or more?

How big should our answer be?

Examples: Working with the Normal Distribution

Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.

  • What is the probability that a randomly selected runner finishes the marathon in 3 and a half hours or more?

We’re looking for a probability and it should be more than 0.5.

Since we have our boundary value and we are looking for a probability, let’s use pnorm()

1 - pnorm(3.5, 4.5, 0.75)
[1] 0.9087888

Answer: There is about a 90.88% chance that a randomly selected marathon runner finished the marathon in 3 and a half hours or more.

Examples: Working with the Normal Distribution

Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.

  • What is the probability that a randomly selected runner finishes the marathon between 3 hours and 5 hours?

Examples: Working with the Normal Distribution

Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.

  • What is the probability that a randomly selected runner finishes the marathon between 3 hours and 5 hours?

Examples: Working with the Normal Distribution

Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.

  • What is the probability that a randomly selected runner finishes the marathon between 3 hours and 5 hours?

How big should our answer be?

Examples: Working with the Normal Distribution

Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.

  • What is the probability that a randomly selected runner finishes the marathon between 3 hours and 5 hours?

How big should our answer be?

It’s tough to tell, but something near 0.5 seems reasonable – nothing too close to 0 or too close to 1 should be expected.

Since we have our boundary values, let’s use pnorm() to find probability.

But How?

Examples: Working with the Normal Distribution

Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.

  • What is the probability that a randomly selected runner finishes the marathon between 3 hours and 5 hours?

How big should our answer be?

It’s tough to tell, but something near 0.5 seems reasonable – nothing too close to 0 or too close to 1 should be expected.

Since we have our boundary values, let’s use pnorm() to find probability.

Examples: Working with the Normal Distribution

Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.

  • What is the probability that a randomly selected runner finishes the marathon between 3 hours and 5 hours?

How big should our answer be?

It’s tough to tell, but something near 0.5 seems reasonable – nothing too close to 0 or too close to 1 should be expected.

Since we have our boundary values, let’s use pnorm() to find probability.

Examples: Working with the Normal Distribution

Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.

  • What is the probability that a randomly selected runner finishes the marathon between 3 hours and 5 hours?

How big should our answer be?

It’s tough to tell, but something near 0.5 seems reasonable – nothing too close to 0 or too close to 1 should be expected.

Since we have our boundary values, let’s use pnorm() to find probability.

pnorm(5, 4.5, 0.75) - pnorm(3, 4.5, 0.75)
[1] 0.7247573

Answer: There is about a 72.48% chance that a randomly selected marathon runner finished the marathon between 3 and 5 hours.

Examples: Working with the Normal Distribution

Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.

  • At what finishing time do the slowest 20% of runners finish the marathon?

Where are the slowest 20% of runners on this distribution?

Examples: Working with the Normal Distribution

Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.

  • At what finishing time do the slowest 20% of runners finish the marathon?

Where are the slowest 20% of runners on this distribution?

There they are! They have the longest finishing times. How long should we expect?

This time, we know the size of that purple area and we are trying to find the boundary value between it and the unshaded portion of the distribution

This will be the minimum finishing time for falling into the slowest 20% of runners

We should use qnorm() in this scenario

Examples: Working with the Normal Distribution

Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.

  • At what finishing time do the slowest 20% of runners finish the marathon?

Where are the slowest 20% of runners on this distribution?

There they are! They have the longest finishing times. How long should we expect?

qnorm() requires the area to the left of the boundary value. How big is that area?

qnorm(0.80, 4.5, 0.75)
[1] 5.131216

Answer: The slowest 20% of runners finish in about 5.13 hours (almost 5 hours and 8 minutes) or longer.

Examples: Working with the Normal (or Binomial?) Distributions

Complete the following examples, but be careful – you’ll need to decide whether to use the binomial distribution (and dbinom() or pbinom()) or the normal distribution (and pnorm() or qnorm())

Scenario: The weight of apples grown in an orchard is approximately normally distributed with a mean of 150 grams and a standard deviation of 20 grams.

  • What is the probability that a randomly selected apple weighs less than 100 grams?
  • What is the probability that a randomly selected apple weighs more that 175 grams?
  • What is the probability that a randomly selected apple weighs between 160 and 195 grams?
  • What is the cutoff for the lightest 5% of apples?
  • What is the cutoff for the heaviest 1% of apples?

Examples: Working with the Normal (or Binomial?) Distributions

Scenario: In a survey, you ask 200 randomly selected people whether they plan to vote in an upcoming election. Historically, 55% of people vote.

  • What is the probability that fewer than 85 people are planning on voting?
  • What is the probability that at least 125 people are planning on voting?

Examples: Working with the Normal (or Binomial?) Distributions

Scenario: The lifespan of a certain smartphone battery is approximately normally distributed with a mean of 18 months and a standard deviation of 3 months.

  • What is the probability that a battery has a lifespan exceeding 2 years (24 months)?
  • What is the probability that a battery has a lifespan between 16 months and 2 years?
  • The manufacturer wants to put a warranty on their batteries, but they want to replace no more than 3% of batteries via warranty. What is the cutoff for the lifespan of these shortest lasting batteries?

Examples: Working with the Normal (or Binomial?) Distributions

Scenario: The daily temperature in a city during the summer months is approximately normally distributed with a mean of 85°F and a standard deviation of 5°F.

  • What is the probability of the temperature being below 65°F?
  • What is the probability of the temperature being above 93°F?
  • What is the probability of the temperature being between 75°F and 83°F?
  • What is the cutoff for the warmest 5% of days?

Next Time…


Discrete Probability and Simulation Lab