September 26, 2024
pnorm()
qnorm()
A normal distribution is a bell-shaped distribution which is parameterized (determined) by a mean \(\mu\) and standard deviation \(\sigma\)
The normal distribution with mean \(\mu\) and standard deviation \(\sigma\) is denoted by \(N\left(\mu,~\sigma\right)\) and has probability density function
\[p\left(x\right) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{\left(x - \mu\right)^2}{2\sigma^2}}\]
Luckily, like the binomial distribution, we don’t evaluate or work with this distribution by hand either!
The black distribution has \(\mu = 0\) and \(\sigma = 1\)
The purple distribution has \(\mu = 1\) and \(\sigma = 0.3\)
The green distribution has \(\mu = -1\) and \(\sigma = 2\)
The black distribution, \(N\left(\mu = 0,~\sigma = 1\right)\), is a special distribution called the standard normal distribution and we often use \(z\) to denote its “support” values.
We’ll encounter this distribution quite often throughout our course.
All of the distributions below have standard deviation \(\sigma = 2\) but they have different means (\(\mu\))
Note: All three distributions have the same shape, but are shifted so that their peak is at their mean.
All of the distributions below have mean \(\mu = 0\) but they have different standard deviations (\(\sigma\))
Note: All of these distributions have the same center, but their width (and height too) change depending on their standard deviation.
Probabilities of outcomes from random variables described by a normal distribution are areas under the corresponding normal curve.
The picture to the right shows \(\mathbb{P}\left[X \geq k\right]\)
Note: It is also worth knowing that the Normal Distribution is symmetric
is the same as…
Note: If \(\mu = 0\), then this relationship simplifies to \(\mathbb{P}\left[X < -k\right] = \mathbb{P}\left[X > k\right]\) – but be careful, this is only true when \(\mu = 0\).
Question: What is the probability that a randomly selected adult male from the United States is exactly 72 inches (6 feet) tall?
Since probabilities are areas under the curve and we can approximate this region by a rectangle, we can use the formula for the area of a rectangle to approximate this probability! What’s the area of a rectangle with 0 width?
Note: For this reason, \(\mathbb{P}\left[X \leq k\right] = \mathbb{P}\left[X < k\right]\) and \(\mathbb{P}\left[X \geq k\right] = \mathbb{P}\left[X > k\right]\).
If we want to estimate probabilities, we’ll need to ask questions about ranges of values.
Determine whether each of the following scenarios involve a random variable that can be modeled by a normal distribution or not.
Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.
Determine whether each of the following scenarios involve a random variable that can be modeled by a normal distribution or not.
Scenario: You roll a fair six-sided die repeatedly until a six appears, and you want to know how many rolls it takes.
Determine whether each of the following scenarios involve a random variable that can be modeled by a normal distribution or not.
Scenario: A factory has a 2% defect rate. Each day, 200 items are produced, and the number of defective items is counted.
Determine whether each of the following scenarios involve a random variable that can be modeled by a normal distribution or not.
Scenario: The weight of apples grown in an orchard is approximately normally distributed with a mean of 150 grams and a standard deviation of 20 grams.
Determine whether each of the following scenarios involve a random variable that can be modeled by a normal distribution or not.
Scenario: A region typically experiences 5 minor earthquakes per year. You are interested in the number of earthquakes in the next year.
Determine whether each of the following scenarios involve a random variable that can be modeled by a normal distribution or not.
Scenario: The number of cars passing through a toll booth in a 10-minute period is recorded. On average, 50 cars pass through every 10 minutes.
Determine whether each of the following scenarios involve a random variable that can be modeled by a normal distribution or not.
Scenario: The lifespan of a certain smartphone battery is approximately normally distributed with a mean of 18 months and a standard deviation of 3 months.
Determine whether each of the following scenarios involve a random variable that can be modeled by a normal distribution or not.
Scenario: A teacher gives a test to 30 students and knows that 80% of the students usually pass. The teacher is interested in the number of students who will pass this time.
Determine whether each of the following scenarios involve a random variable that can be modeled by a normal distribution or not.
Scenario: The time until a particular brand of light bulb fails is recorded. The failure times do not follow a normal pattern but tend to follow a long-tailed distribution.
Determine whether each of the following scenarios involve a random variable that can be modeled by a normal distribution or not.
Scenario: In a survey, you ask 200 randomly selected people whether they plan to vote in an upcoming election. Historically, 55% of people vote.
Determine whether each of the following scenarios involve a random variable that can be modeled by a normal distribution or not.
Scenario: The daily temperature in a city during the summer months is approximately normally distributed with a mean of 85°F and a standard deviation of 5°F.
Determine whether each of the following scenarios involve a random variable that can be modeled by a normal distribution or not.
Scenario: A social media influencer posts a new photo, and the number of likes received over the next 24 hours is recorded. The number of likes does not seem to follow any regular or predictable pattern.
Determine whether each of the following scenarios involve a random variable that can be modeled by a normal distribution or not.
Scenario: The distance run by professional athletes in a 30-minute endurance test is normally distributed with a mean of 5 kilometers and a standard deviation of 0.3 kilometers.
pnorm()
)Assume that \(X\sim N\left(\mu = \text{mu}, \sigma = \text{sigma}\right)\).
That is, the random variable \(X\) is normally distributed with mean \(\text{mu}\) and standard deviation \(\text{sigma}\).
Since \(\mathbb{P}\left[X = k\right] = 0\) (that is, the probability that the random variable \(X\) takes on the value \(k\) exactly, is 0), we will not make use of any dnorm()
function
The probability that \(X\) takes on a value less than or equal to (or just less than) \(k\) is \(\mathbb{P}\left[X \leq k\right] \approx\) pnorm(k, mean = mu, sd = sigma)
In cases where we want to find \(\mathbb{P}\left[x \geq k\right]\), \(\mathbb{P}\left[k_1 \leq X\leq k_2\right]\), and others, we’ll need to…
Strategy: Draw a picture and let your picture tell you how to use pnorm()
qnorm()
)Assume that \(X\sim N\left(\mu = \text{mu}, \sigma = \text{sigma}\right)\).
That is, the random variable \(X\) is normally distributed with mean \(\text{mu}\) and standard deviation \(\text{sigma}\).
In addition to computing probabilities, we can compute percentiles/quantiles (cut-off or boundary values)
The boundary value \(k^*\) such that \(\mathbb{P}\left[X \leq k^*\right] = p\) is given by
qnorm(p, mean = mu, sd = sigma)
Note: \(p\) must be the area to the LEFT of the boundary value your are looking for
pnorm()
if you have a boundary value(s) and need to find a probabilityqnorm()
if you have a probability/area and need to find a boundary valueScenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.
Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.
Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.
How big should our answer be?
Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.
We’re looking for a probability and it should be less than 0.5
Since we have our boundary value and we are looking for a probability, let’s use pnorm()
Answer: There is about a 25.25% chance that a randomly selected marathon runner finished the marathon in under 4 hours.
Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.
Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.
Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.
How big should our answer be?
Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.
We’re looking for a probability and it should be more than 0.5.
Since we have our boundary value and we are looking for a probability, let’s use pnorm()
Answer: There is about a 90.88% chance that a randomly selected marathon runner finished the marathon in 3 and a half hours or more.
Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.
Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.
Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.
How big should our answer be?
Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.
How big should our answer be?
It’s tough to tell, but something near 0.5 seems reasonable – nothing too close to 0 or too close to 1 should be expected.
Since we have our boundary values, let’s use pnorm()
to find probability.
But How?
Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.
How big should our answer be?
It’s tough to tell, but something near 0.5 seems reasonable – nothing too close to 0 or too close to 1 should be expected.
Since we have our boundary values, let’s use pnorm()
to find probability.
Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.
How big should our answer be?
It’s tough to tell, but something near 0.5 seems reasonable – nothing too close to 0 or too close to 1 should be expected.
Since we have our boundary values, let’s use pnorm()
to find probability.
Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.
How big should our answer be?
It’s tough to tell, but something near 0.5 seems reasonable – nothing too close to 0 or too close to 1 should be expected.
Since we have our boundary values, let’s use pnorm()
to find probability.
Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.
Where are the slowest 20% of runners on this distribution?
Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.
Where are the slowest 20% of runners on this distribution?
There they are! They have the longest finishing times. How long should we expect?
This time, we know the size of that purple area and we are trying to find the boundary value between it and the unshaded portion of the distribution
This will be the minimum finishing time for falling into the slowest 20% of runners
We should use qnorm()
in this scenario
Scenario: The time it takes runners to complete a marathon is approximately normally distributed with a mean of 4.5 hours and a standard deviation of 0.75 hours.
Where are the slowest 20% of runners on this distribution?
There they are! They have the longest finishing times. How long should we expect?
qnorm()
requires the area to the left of the boundary value. How big is that area?
Complete the following examples, but be careful – you’ll need to decide whether to use the binomial distribution (and dbinom()
or pbinom()
) or the normal distribution (and pnorm()
or qnorm()
)
Scenario: The weight of apples grown in an orchard is approximately normally distributed with a mean of 150 grams and a standard deviation of 20 grams.
Scenario: In a survey, you ask 200 randomly selected people whether they plan to vote in an upcoming election. Historically, 55% of people vote.
Scenario: The lifespan of a certain smartphone battery is approximately normally distributed with a mean of 18 months and a standard deviation of 3 months.
Scenario: The daily temperature in a city during the summer months is approximately normally distributed with a mean of 85°F and a standard deviation of 5°F.