Topic 6: The Normal Distribution

function ok_checkbox(response, n) {
  if (!response || response.length === 0)
    return html`<span style="color:purple">You haven't answered yet.</span>`;
  if (response.toString() === n)
    return html`<span style="color:green">Correct ✓</span>`;
  return html`<span style="color:red">Not Yet! ✗</span>`;
}

About

This activity covers working with normally distributed data, including the calculation of probabilities and percentiles, and introduces the notion of the z-score.

The Normal Distribution

Throughout this activity we’ll investigate the probability distribution that is most central to our study of statistics: the normal distribution. If we are confident that our data are nearly normal, that opens the door to many powerful statistical methods. This activity gives you practice working with normally distributed data.

Objectives

Workbook Objectives: After completing this workbook you should be able to:

Compute probabilities of events well-modeled by a normal distribution.
Given a variable \(X\) which follows an assumed normal distribution, compute and interpret various percentile thresholds for \(X\).
Identify scenarios to which the normal or binomial distributions can be applied, and use them to answer various probability-related questions.

The Normal Distribution

Definition: If a random variable \(X\) is normally distributed with mean \(\mu\) and standard deviation \(\sigma\), we often write \(X\sim N\left(\mu, \sigma\right)\). Three different normal distributions appear below.

In blue is a normal distribution with \(\mu = 0\) and \(\sigma = 5\)
In red is a normal distribution with \(\mu = 0\) and \(\sigma = 0.5\)
In black is a normal distribution with \(\mu = 0\) and \(\sigma = 1\) (the so-called Standard Normal Distribution)

Notice that all three distributions are bell-shaped and are centered at their mean (\(\mu = 0\)). The larger the standard deviation, the shorter and wider the curve, while the smaller the standard deviation, the taller and more narrow the curve.

Given that \(X\sim N\left(\mu, \sigma\right)\), we can compute probabilities associated with observed values of \(X\) by finding the corresponding area beneath the normal curve with mean \(\mu\) and standard deviation \(\sigma\).

Properties of the Normal Distribution: Consider \(X\sim N\left(\mu, \sigma\right)\).

The area beneath the entire distribution is 1 (since this is equivalent to the probability that \(X\) takes on any of its possible values).

\(\displaystyle{\mathbb{P}\left[X\leq \mu\right] = \mathbb{P}\left[X\geq \mu\right] = 0.5}\) (the area underneath a full half of the distribution is 0.5)

The distribution is symmetric. In symbols, \(\mathbb{P}\left[X\leq \mu - k\right] = \mathbb{P}\left[X \geq \mu + k\right]\) for any \(k\).

\(\displaystyle{\mathbb{P}\left[X = k\right] = 0}\) (the probability that \(X\) takes on any prescribed value exactly is \(0\))

Important

Unlike the binomial distribution, where the distinction between at least and more than required careful adjustment, the probability that a continuous variable \(X\) takes on any prescribed value exactly is \(0\) (that is, \(\mathbb{P}\left[X = k\right] = 0\)). This means there is no difference between \(\mathbb{P}\left[X \leq k\right]\) and \(\mathbb{P}\left[X < k\right]\). Similarly, there is no difference between \(\mathbb{P}\left[X \geq k\right]\) and \(\mathbb{P}\left[X > k\right]\).

Sometimes it is useful to be able to estimate probabilities or to estimate the proportion of a population that falls into a range, as long as the population is nearly normal. A convenient rule of thumb is the Empirical Rule.

The Empirical Rule

If \(X\sim N\left(\mu, \sigma\right)\), then

\(\mathbb{P}\left[\mu - \sigma \leq X\leq \mu + \sigma\right] \approx 0.67\) — that is, about 67% of observations lie within one standard deviation of the mean.
\(\mathbb{P}\left[\mu - 2\sigma \leq X\leq \mu + 2\sigma\right] \approx 0.95\) — that is, about 95% of observations lie within two standard deviations of the mean.
\(\mathbb{P}\left[\mu - 3\sigma \leq X\leq \mu + 3\sigma\right] \approx 0.997\) — that is, about 99.7% of observations lie within three standard deviations of the mean.

For each of the following, assume that \(X\sim N\left(\mu = 85, \sigma = 5\right)\).

Check Your Understanding: The Empirical Rule I

Use the Empirical Rule to approximate \(\mathbb{P}\left[80\leq X\leq 90\right]\).

mutable ok_response = (response, n) => { return html`Loading...` };
viewof q1 = Inputs.radio(
  new Map([
    ["50%", 1],
    ["67%", 2],
    ["95%", 3],
    ["Nearly 100%", 4],
    ["It is impossible to tell", 5]
  ]),
  {value: JSON.parse(localStorage.getItem("q1_selected") ?? "null")}
);

{
  localStorage.setItem("q1_selected", JSON.stringify(q1));
  localStorage.setItem("q1_correct", "2");
  localStorage.setItem("q1_result", q1 === null ? "unattempted" : (q1 == 2 ? "correct" : "incorrect"));
}

ok_response(q1, "2");

Check Your Understanding: The Empirical Rule II

According to the Empirical Rule, which of the following are boundary values for which we expect about 95% of observed values of \(X\) to fall between?

viewof q2 = Inputs.checkbox(
  new Map([
    ["0", 1],
    ["50", 2],
    ["75", 3],
    ["80", 4],
    ["90", 5],
    ["95", 6],
    ["100", 7]
  ]),
  {value: JSON.parse(localStorage.getItem("q2_selected") ?? "[]") ?? []}
);

{
  localStorage.setItem("q2_selected", JSON.stringify(q2));
  localStorage.setItem("q2_correct", "3,6");
  localStorage.setItem("q2_result", (!q2 || q2.length === 0) ? "unattempted" : (q2.toString() === "3,6" ? "correct" : "incorrect"));
}

ok_checkbox(q2, "3,6");

Standardization and \(z\)-Scores

Scenario: Two students, Bob and Sally, are trying to compare how well they did on a college entrance exam. The difficulty comes in that Bob took the SAT which is known to follow an approximate normal distribution with a mean score of 1068 and a standard deviation of 210, while Sally took the ACT which also follows an approximately normal distribution but with a mean score of 20.8 and a standard deviation of 5.8. If Bob scored a 1400 on the SAT and Sally scored a 31 on the ACT, who scored relatively higher?

How do we answer this question? We’ll see two methods.

Method 1: We can standardize the test scores so that they have comparable units.

Definition: If an observation \(x\) comes from a nearly normal population with mean \(\mu\) and standard deviation \(\sigma\), then we compute the \(z\)-score associated with \(x\) as follows:

\[\displaystyle{z = \frac{x - \mu}{\sigma}}\]

An observation’s \(z\)-score is simply the number of standard deviations it falls above or below the mean.

Use the code block below to compute Bob and Sally’s \(z\)-scores and answer the questions that follow.

Hint 6

Now do the same for Sally, but be sure to use the mean (\(\mu\)) and standard deviation (\(\sigma\)) from the ACT exam, since that’s the exam she completed.

# Bob
(1400 - 1068) / 210

# Sally
(___ - ___) / ___

Check Your Understanding: \(z\)-Scores I

Which of the following is the \(z\)-score corresponding to Bob? (round to two decimal places)

viewof q3 = Inputs.radio(
  new Map([
    ["1.58", 1],
    ["1394.91", 2],
    ["-1.58", 3],
    ["1.76", 4],
    ["-1.76", 5]
  ]),
  {value: JSON.parse(localStorage.getItem("q3_selected") ?? "null")}
);

{
  localStorage.setItem("q3_selected", JSON.stringify(q3));
  localStorage.setItem("q3_correct", "1");
  localStorage.setItem("q3_result", q3 === null ? "unattempted" : (q3 == 1 ? "correct" : "incorrect"));
}

ok_response(q3, "1");

Check Your Understanding: \(z\)-Scores II

Which of the following is the \(z\)-score corresponding to Sally? (round to two decimal places)

viewof q4 = Inputs.radio(
  new Map([
    ["1.58", 1],
    ["27.41", 2],
    ["-1.58", 3],
    ["1.76", 4],
    ["-1.76", 5]
  ]),
  {value: JSON.parse(localStorage.getItem("q4_selected") ?? "null")}
);

{
  localStorage.setItem("q4_selected", JSON.stringify(q4));
  localStorage.setItem("q4_correct", "4");
  localStorage.setItem("q4_result", q4 === null ? "unattempted" : (q4 == 4 ? "correct" : "incorrect"));
}

ok_response(q4, "4");

Check Your Understanding: \(z\)-Scores III

Who did relatively better on their standardized exam, Bob or Sally? Why?

viewof q5 = Inputs.radio(
  new Map([
    ["Sally, since her score is 1.76 standard deviations above the mean, while Bob's was only 1.58 standard deviations above the mean.", 1],
    ["Bob, since his z-score was 1.58 which is closer to the mean than Sally's z-score of 1.76.", 2],
    ["Bob, since his score was 1400 and Sally's score was only 31.", 3],
    ["Bob, since the SAT had a larger standard deviation it is a harder exam.", 4]
  ]),
  {value: JSON.parse(localStorage.getItem("q5_selected") ?? "null")}
);

{
  localStorage.setItem("q5_selected", JSON.stringify(q5));
  localStorage.setItem("q5_correct", "1");
  localStorage.setItem("q5_result", q5 === null ? "unattempted" : (q5 == 1 ? "correct" : "incorrect"));
}

ok_response(q5, "1");

A Recap on \(z\)-Scores

We can use \(z\)-scores as a common unit for comparing observations from completely different populations (such as SAT scores and ACT scores). Here’s a recap of the most important information so far:

If an observation \(x\) comes from a nearly normal population with mean \(\mu\) and standard deviation \(\sigma\), we can compute its \(z\)-score using the formula: \(\displaystyle{z = \frac{x - \mu}{\sigma}}\).
A \(z\)-score measures the number of standard deviations which an observation falls above or below the mean.
- A positive \(z\)-score means that an observation was above the mean.
- A negative \(z\)-score means that an observation was below the mean.
- The larger a \(z\)-score is in absolute value, the further the corresponding observation falls from the mean.

Method 2: We can compute the percentile corresponding to Bob’s SAT score and the percentile corresponding to Sally’s ACT score.

Definition: Given an observation \(x\) from a population, the percentile corresponding to \(x\) is the proportion of the population which falls below \(x\).

Bob’s percentile corresponds to the shaded area in the distribution below.

Sally’s percentile corresponds to the shaded area in the distribution below.

There are many ways to compute percentiles. Before the widespread availability of statistical software, people converted observed values to \(z\)-scores and then looked up the percentile in a table. Luckily R provides nice functionality for computing percentiles.

Computing Percentiles in R

If \(X\sim N\left(\mu, \sigma\right)\), then \[\mathbb{P}\left[X\leq q\right] \approx \tt{pnorm(q, mean = \mu, sd = \sigma)}\]

The block below is preset to compute Bob’s percentile. Execute the code cell and then adapt the code to find Sally’s percentile. Use your results to answer the questions below.

Check Your Understanding: Percentiles I

Which of the following is the percentile corresponding to Bob? (round to four decimal places)

viewof q6 = Inputs.radio(
  new Map([
    ["0.9431", 1],
    ["0.0569", 2],
    ["0.4431", 3],
    ["0.9608", 4],
    ["0.0392", 5]
  ]),
  {value: JSON.parse(localStorage.getItem("q6_selected") ?? "null")}
);

{
  localStorage.setItem("q6_selected", JSON.stringify(q6));
  localStorage.setItem("q6_correct", "1");
  localStorage.setItem("q6_result", q6 === null ? "unattempted" : (q6 == 1 ? "correct" : "incorrect"));
}

ok_response(q6, "1");

Check Your Understanding: Percentiles II

Which of the following is the percentile corresponding to Sally? (round to four decimal places)

viewof q7 = Inputs.radio(
  new Map([
    ["0.9431", 1],
    ["0.0569", 2],
    ["0.4431", 3],
    ["0.9607", 4],
    ["0.0392", 5]
  ]),
  {value: JSON.parse(localStorage.getItem("q7_selected") ?? "null")}
);

{
  localStorage.setItem("q7_selected", JSON.stringify(q7));
  localStorage.setItem("q7_correct", "4");
  localStorage.setItem("q7_result", q7 === null ? "unattempted" : (q7 == 4 ? "correct" : "incorrect"));
}

ok_response(q7, "4");

Check Your Understanding: Percentiles III

Who did relatively better on their standardized exam, Bob or Sally? Why?

viewof q8 = Inputs.radio(
  new Map([
    ["Sally, since she scored in a higher percentile than Bob.", 1],
    ["Bob, since he scored in a lower percentile than Sally.", 2],
    ["Bob, since he scored closer to the mean.", 3],
    ["Bob, since his percentile score was so high even though the standard deviation on the SAT was so large.", 4]
  ]),
  {value: JSON.parse(localStorage.getItem("q8_selected") ?? "null")}
);

{
  localStorage.setItem("q8_selected", JSON.stringify(q8));
  localStorage.setItem("q8_correct", "1");
  localStorage.setItem("q8_result", q8 === null ? "unattempted" : (q8 == 1 ? "correct" : "incorrect"));
}

ok_response(q8, "1");

We’ll make good use of this second method for a while, but don’t forget about standardization and \(z\)-scores. We’ll need that strategy quite often later in our course! For now, let’s move on to practicing with finding probabilities from a normal distribution using R’s pnorm() function.

Computing Probability from a Normal Distribution

Through this section you’ll be getting practice finding probabilities by using R’s pnorm() function to compute areas. Remember that the pnorm() function takes three arguments — the first is a boundary value, the second is the mean of the distribution, and the third is the standard deviation. The value returned by pnorm() is the area to the left of the provided boundary value in the distribution with the mean and standard deviation you provided.

For these first few questions I’ll draw pictures for you, but you should be prepared to draw your own shortly.

Question 1: Use the code block below to find \(\mathbb{P}\left[Z <\right.\) \(\left.\right]\) — remember that \(Z\sim N\left(\mu = 0, \sigma = 1\right)\).

Hint 2

Use the pnorm() function with the boundary of the shaded region as the first argument, the mean (\(\mu\)) of the normal distribution as the second, and the standard deviation (\(\sigma\)) as the third.

pnorm(___, ___, ___)


pnorm(boundary1, 0, 1)

Question 2: Find \(\mathbb{P}\left[Z <\right.\) \(\left.\right]\).

Hint 4

Answer this question exactly the way you answered the previous one. The boundary value here is , which is negative, but pnorm() will always give you the area to the left of your provided boundary value — which is what you want here.

pnorm(___, ___, ___)


pnorm(boundary2, 0, 1)

Question 3: Find \(\mathbb{P}\left[Z >\right.\) \(\left.\right]\).

Hint 8

This is still the standard normal distribution \(N\left(\mu = 0, \sigma = 1\right)\). We just want to remove the unshaded area in the lower (left) tail of the distribution.

1 - pnorm(___, 0, 1)


1 - pnorm(boundary3, 0, 1)

Question 4: Find \(\mathbb{P}\left[Z >\right.\) \(\left.\right]\).


1 - pnorm(boundary4, 0, 1)

Question 5: Find \(\mathbb{P}\left[\right.\) \(< Z <\) \(\left.\right]\).

Hint 4

Filling the first argument of pnorm(___, 0, 1) with gives an area larger than what we want. It shares a right-side boundary with our target area. Can we subtract something from it?

pnorm(___, 0, 1) - ___


pnorm(boundary5b, 0, 1) - pnorm(boundary5a, 0, 1)

Question 6: Find \(\mathbb{P}\left[Z <\right.\) \(\text{ or } Z >\) \(\left.\right]\).


pnorm(boundary6a, 0, 1) + (1 - pnorm(boundary6b, 0, 1))

Question 7: Find \(\mathbb{P}\left[\left|Z\right| >\right.\) \(\left.\right]\).

Hint 5 (Solved)

Since the two tail areas are equal, just calculate one of them and double it. The two boundary values are (left) and (right).

Use the lower tail (left) boundary () as the first argument to pnorm() below, since pnorm() calculates the area to the left of a given boundary value.

2 * pnorm(___, 0, 1)


2 * pnorm(-boundary7, 0, 1)

Probabilities from Other Normal Distributions

Through the last seven problems you only worked with the standard normal distribution — that’s the \(Z\)-distribution, which is \(N\left(\mu = 0, \sigma = 1\right)\). We can find probabilities from arbitrary normal distributions using R’s pnorm() functionality — just supply the appropriate mean and sd arguments instead of the 0 and 1 we passed earlier.

Finding Percentile Cutoffs on a Normal Distribution

Recall from earlier that the \(p^{th}\) percentile of a random variable \(X\) is the value \(x^*\) such that \(\mathbb{P}\left[X < x^*\right] = p\).

If \(X\sim N\left(\mu, \sigma\right)\), then to find the cutoff \(x^*\) for which \(\mathbb{P}\left[X < x^*\right] = p\), we can use R’s qnorm() function. Similar to pnorm(), this function takes three arguments. The first is the area to the left of the desired cutoff, the second is the mean of the distribution, and the third is the standard deviation.

Recall from earlier that SAT scores followed \(N\left(\mu = 1068, \sigma = 210\right)\) and ACT scores followed \(N\left(\mu = 20.8, \sigma = 5.8\right)\). The code block below is set up to find the minimum required SAT score to fall in the 95th percentile. Execute the code and note the required score. Then adapt the code to find the minimum ACT score required to fall into the top 10% of all ACT test takers. Does your answer seem right?

Hint 3

The first argument is the area to the left of the desired cutoff score. What should this area be if we want the top 10% of all scores?

# SAT 95th percentile
qnorm(0.95, 1068, 210)

# ACT top 10%
qnorm(___, 20.8, 5.8)

Hint 4 (Solved)

To be in the top 10%, a score must exceed 90% of all other scores, so the area to the left is 0.90. Use 0.90 as the first agument to the qnorm() function corresponding to ACT scores below.

# SAT 95th percentile
qnorm(0.95, 1068, 210)

# ACT top 10%
qnorm(0.90, 20.8, 5.8)


qnorm(0.95, 1068, 210)
qnorm(0.9, 20.8, 5.8)

Practice with the Normal and Binomial Distributions

Through this last section you’ll work through a set of problems, some of which use the normal distribution while others use the binomial distribution. It is up to you to determine which distribution should be applied in each problem. Below are a few helpful reminders:

The binomial distribution can be applied to scenarios of repeated trials, where each trial has two possible outcomes and the probability of “success” on each trial remains constant. If \(X\) is the number of successes in a binomial experiment with \(n\) trials and probability of success \(p\):
- \(\mathbb{P}\left[X = k\right] = \tt{dbinom(k, n, p)}\)
- \(\mathbb{P}\left[X \leq k\right] = \tt{pbinom(k, n, p)}\)
The normal distribution can be applied to scenarios where data follows at least a nearly-normal distribution. If \(X\sim N\left(\mu, \sigma\right)\):
- \(\mathbb{P}\left[X \leq k\right] = \tt{pnorm(k, mean = \mu, sd = \sigma)}\)
- The \(p^{th}\) percentile of \(X\) is given by \(\tt{qnorm(p, mean = \mu, sd = \sigma)}\)

Practice Problem 1: The National Vaccine Information Center estimates that 90% of Americans have had chickenpox by the time they reach adulthood. Suppose we take a random sample of American adults. Answer each of the following:

Compute the expected number of adults in our sample who will have had chickenpox.


chickPoxSampSize * 0.90

Compute the standard deviation in number of adults who will have had chickenpox across samples of size .


sqrt(chickPoxSampSize * 0.9 * 0.1)

Check Your Understanding: Surprising Results?

Would you be surprised if your sample of adults contained at most adults who have had chickenpox? Why? Select all that are appropriate.

viewof q9 = Inputs.checkbox(
  new Map([
    ["Yes. This result would fall more than two standard deviations away from the expected value.", 1],
    ["Yes. If our assumptions are correct, the probability of this occurring would be less than 5%.", 2],
    ["Yes. We should be surprised by any result at all — we can't know what will happen ahead of time.", 3],
    ["No. A result like this one is certainly possible.", 4],
    ["No. This result does not fall more than two standard deviations from the expected value.", 5]
  ]),
  {value: JSON.parse(localStorage.getItem("q9_selected") ?? "[]") ?? []}
);

{
  localStorage.setItem("q9_selected", JSON.stringify(q9));
  localStorage.setItem("q9_correct", "1,2");
  localStorage.setItem("q9_result", (!q9 || q9.length === 0) ? "unattempted" : (q9.toString() === "1,2" ? "correct" : "incorrect"));
}

ok_checkbox(q9, "1,2");

While you answer the following questions, it might be useful to refer back to the Game of Dreidel example and solution from the Topic 5 activity.

Find the probability that exactly adults in your sample of will have had chickenpox.

Hint 5

There are adults in our sample, so the number of trials is . Fill in the second blank below with .

The probability of an individual adult having had chickenpox is 0.90. Fill in the third blank with 0.90.

dbinom(___, ___, ___)

Hint 6 (Solved)

There are adults in our sample, so the number of trials is . Fill in the second blank below with .

The probability of an individual adult having had chickenpox is 0.90. Fill in the third blank with 0.90.

We are interested in exactly adults having had chickenpox, so fill in the first blank with .

dbinom(___, ___, ___)


dbinom(chickPox_exact1, chickPoxSampSize, 0.9)

Find the probability that exactly adults in your sample of will not have had chickenpox.


dbinom(chickPox_exact2, chickPoxSampSize, 0.1)

Find the probability that at most adults in your sample of will have had chickenpox.


pbinom(chickPox_atmost, chickPoxSampSize, 0.9)

Find the probability that at least adults in your sample of will have had chickenpox.


1 - pbinom(chickPox_atleast - 1, chickPoxSampSize, 0.9)

Find the probability that more than but less than adults in your sample of will have had chickenpox.

Hint 3

The phrase “more than but less than ” means both boundaries are excluded. The sample size is and the probability of success is 0.90.

pbinom(___, ___, ___) - pbinom(___, ___, ___)

Hint 4

The phrase “more than but less than ” means both boundaries are excluded. The sample size is and the probability of success is 0.90.

The number of trials in both scenarios is and the probability of success is 0.90 for each as well. Use these as the second and third arguments for both instances of pbinom() below.

pbinom(___, ___, ___) - pbinom(___, ___, ___)

Hint 5

The phrase “more than but less than ” means both boundaries are excluded. The sample size is and the probability of success is 0.90.

The number of trials in both scenarios is and the probability of success is 0.90 for each as well. Use these as the second and third arguments for both instances of pbinom() below.

For the first call to pbinom() the maximum number of successful outcomes you are interested in is . Use this as the first argument to the first call to pbinom().

pbinom(___, ___, ___) - pbinom(___, ___, ___)

Hint 6 (Solved)

Pass , , and 0.90 as the arguments to the first call to pbinom(). Similarly, pass , , and 0.90 as the arguments to the second call to pbinom().

pbinom(___, ___, ___) - pbinom(___, ___, ___)


pbinom(chickPox_between_hi - 1, chickPoxSampSize, 0.9) - pbinom(chickPox_between_lo, chickPoxSampSize, 0.9)

Practice Problem 2: Sophia took the Graduate Record Examination (GRE) and scored 160 on the Verbal Reasoning section and 157 on the Quantitative Reasoning section. The mean score on the Verbal Reasoning section for all test takers was 151 with a standard deviation of 7, and the mean score for the Quantitative Reasoning section was 153 with a standard deviation of 7.67. Suppose we can assume that both score distributions are nearly normal.

Use the code block below to compute Sophia’s \(z\)-score on the Quantitative Reasoning exam.

Hint 2 (Solved)

The \(z\)-score formula is \((x - \mu) / \sigma\).

Sophia scored 157 on the Quantitative Reasoning exam, the mean is 153, and the standard deviation is 7.67.

(___ - ___) / ___


(157 - 153) / 7.67

Use the code block below to compute Sophia’s \(z\)-score on the Verbal Reasoning exam.


(160 - 151) / 7

Check Your Understanding: Comparing \(z\)-Scores

What do these \(z\)-scores tell you?

viewof q10 = Inputs.radio(
  new Map([
    ["Sophia scored relatively higher on the Verbal Reasoning exam since her Verbal Reasoning z-score was larger.", 1],
    ["Sophia scored relatively higher on the Quantitative Reasoning exam since her Quantitative Reasoning z-score was larger.", 2],
    ["Sophia scored relatively higher on the Quantitative Reasoning exam since her Quantitative Reasoning z-score was closer to 0.", 3],
    ["It is impossible to tell which exam Sophia performed better on since the means and standard deviations are different.", 4]
  ]),
  {value: JSON.parse(localStorage.getItem("q10_selected") ?? "null")}
);

{
  localStorage.setItem("q10_selected", JSON.stringify(q10));
  localStorage.setItem("q10_correct", "1");
  localStorage.setItem("q10_result", q10 === null ? "unattempted" : (q10 == 1 ? "correct" : "incorrect"));
}

ok_response(q10, "1");

Find the proportion of test takers Sophia scored higher than on the Quantitative Reasoning exam (that is, find her percentile).

Hint 3

We want the proportion of test takers below Sophia’s score of 157 — that’s the area to the left, which is exactly what pnorm() gives us. The mean is 153 and the standard deviation is 7.67.

pnorm(___, ___, ___)

Hint 4 (Solved)

We want the proportion of test takers below Sophia’s score of 157 — that’s the area to the left, which is exactly what pnorm() gives us. The mean is 153 and the standard deviation is 7.67.

Fill the first argument of pnorm() with 157, the second with 153, and the third with 7.67.

pnorm(___, ___, ___)


pnorm(157, 153, 7.67)

Find the proportion of test takers doing better than Sophia on the Verbal Reasoning exam.

Hint 3 (Solved)

Sophia’s Verbal Reasoning score is 160, the mean is 151, and the standard deviation is 7.

Use 160 as the first argument, 151 as the second, and 7 as the third in the call to pnorm() below.

1 - pnorm(___, ___, ___)


1 - pnorm(160, 151, 7)

Submit

If you are part of a course with an instructor who is grading your work on these activities, please copy and submit both of the hashes below using the method your instructor has requested.

Question Hash

The hash below encodes your responses to the multiple choice and checkbox questions in this activity.

function buildQuestionResults() {
  return {
    notebook: "Topic 6: The Normal Distribution",
    type: "questions",
    timestamp: new Date().toISOString(),
    questions: {
      q1_empirical_rule_1: {
        selected: q1,
        correct_answer: "2",
        result: q1 === null ? "unattempted" : (q1 == 2 ? "correct" : "incorrect")
      },
      q2_empirical_rule_2: {
        selected: q2,
        correct_answer: "3,6",
        result: (!q2 || q2.length === 0) ? "unattempted" : (q2.toString() === "3,6" ? "correct" : "incorrect")
      },
      q3_z_score_bob: {
        selected: q3,
        correct_answer: "1",
        result: q3 === null ? "unattempted" : (q3 == 1 ? "correct" : "incorrect")
      },
      q4_z_score_sally: {
        selected: q4,
        correct_answer: "4",
        result: q4 === null ? "unattempted" : (q4 == 4 ? "correct" : "incorrect")
      },
      q5_z_score_comparison: {
        selected: q5,
        correct_answer: "1",
        result: q5 === null ? "unattempted" : (q5 == 1 ? "correct" : "incorrect")
      },
      q6_percentile_bob: {
        selected: q6,
        correct_answer: "1",
        result: q6 === null ? "unattempted" : (q6 == 1 ? "correct" : "incorrect")
      },
      q7_percentile_sally: {
        selected: q7,
        correct_answer: "4",
        result: q7 === null ? "unattempted" : (q7 == 4 ? "correct" : "incorrect")
      },
      q8_percentile_comparison: {
        selected: q8,
        correct_answer: "1",
        result: q8 === null ? "unattempted" : (q8 == 1 ? "correct" : "incorrect")
      },
      q9_surprising_results: {
        selected: q9,
        correct_answer: "1,2",
        result: (!q9 || q9.length === 0) ? "unattempted" : (q9.toString() === "1,2" ? "correct" : "incorrect")
      },
      q10_z_score_gre: {
        selected: q10,
        correct_answer: "1",
        result: q10 === null ? "unattempted" : (q10 == 1 ? "correct" : "incorrect")
      }
    }
  };
}

function toBase64(str) {
  return btoa(unescape(encodeURIComponent(str)));
}

question_hash = {
  q1; q2; q3; q4; q5; q6; q7; q8; q9; q10;
  return toBase64(JSON.stringify(buildQuestionResults()));
}

html`<div style="font-family: monospace; font-size: 0.85em; background: #f5f5f5; padding: 12px; border-radius: 6px; word-break: break-all; border: 1px solid #ddd; user-select: all; cursor: pointer;" onclick="navigator.clipboard.writeText(this.innerText)">
  ${question_hash}
</div>
<p style="margin-top: 8px; font-size: 0.9em; color: #555;">
  Click the box to copy to clipboard.
</p>`

Exercise Hash

Click the button below to generate your exercise submission code. This hash encodes your work on the graded code exercises in this activity.

You must have attempted the graded exercises before clicking — clicking generates a snapshot of your current results. If you have completed the activity over multiple sessions, please go back through and hit the Run Code button on each graded exercise before generating the hash below, to ensure your most recent results are recorded.

Summary

Main Takeaways

A normal distribution is bell-shaped and fully described by its mean \(\mu\) and standard deviation \(\sigma\), written \(N(\mu, \sigma)\). Larger \(\sigma\) produces a shorter, wider curve; smaller \(\sigma\) produces a taller, narrower curve.
The Empirical Rule: for data following \(N(\mu, \sigma)\), approximately 67% of observations fall within one standard deviation of the mean, 95% within two, and 99.7% within three.
For a continuous distribution, \(\mathbb{P}[X = k] = 0\), so there is no distinction between strict and non-strict inequalities. For example, \(\mathbb{P}[X \leq k] = \mathbb{P}[X < k]\).
A \(z\)-score measures how many standard deviations an observation falls from the mean: \(z = (x - \mu)/\sigma\). \(z\)-scores allow comparison across different normal distributions on a common scale.
pnorm(q, mean, sd) returns the area to the left of \(q\) — that is, \(\mathbb{P}[X \leq q]\). For right-tail and two-tail probabilities, use 1 - pnorm(...) and combinations thereof.
qnorm(p, mean, sd) returns the value \(x^*\) such that \(\mathbb{P}[X \leq x^*] = p\) — the \(p^{th}\) percentile. Remember that \(p\) is the area to the left of the desired cutoff.
When a scenario involves repeated independent yes/no trials with constant probability, use the binomial distribution (dbinom, pbinom). When data are approximately normally distributed, use the normal distribution (pnorm, qnorm).

Looking Ahead

The next activity is a discrete probability and simulation lab exploring the hot hand phenomenon in basketball — a fun application of the ideas we’ve encountered in our short study of probability. You’ll simulate data, compare distributions, and think carefully about what “random” actually looks like in practice.