Topic 9: Foundations for Inference

About

This activity provides an introduction to the Central Limit Theorem. You’ll encounter the Central Limit Theorem in action through several simulations which construct approximations of the sampling distribution. You’ll have opportunities to change parameters of the population distribution as well as the size of the samples being drawn. The goal is to discover connections between a population distribution, sample size, and the resulting sampling distribution.

{
  const btn = html`<button style="
    background: #436b95;
    color: white;
    border: none;
    padding: 8px 16px;
    border-radius: 4px;
    cursor: pointer;
    font-size: 0.9em;
  ">🔄 Reset My Responses</button>`;

  btn.onclick = () => {
    if (confirm('Reset all responses for this activity? This cannot be undone.')) {
      Object.keys(localStorage)
        .filter(k => k.endsWith('_selected_t9') || k.endsWith('_correct_t9') || k.endsWith('_result_t9'))
        .forEach(k => localStorage.removeItem(k));
      location.reload();
    }
  };

  return btn;
}

Note. The button above resets multiple choice and checkbox questions. Currently, resetting code cells must be done manually via hitting the Start Over button on each individual interactive cell.

Foundations for Inference

In this activity we’ll begin investigating the true power of statistics — using sample data to make accurate claims about a population, even when we don’t have access to the entire population. We start here by exploring the connection between a population distribution and the distribution of sample means, often called the sampling distribution. We’ll do this through a series of interactive code blocks which you will run and use to answer questions.

Exploring the Connection Between Population and Sampling Distributions

Start by viewing the following video from the New York Times.

The video claimed that the sampling distribution can help us answer questions about the population. This is really important because, as we mentioned in our first activity, collecting a true census is almost always impossible. In the following sections, you’ll use the provided code blocks to explore the connection between the population and the sampling distribution for various different populations.

The Sampling Distribution of the Sample Mean

The sampling distribution of the sample mean is a theoretical distribution consisting of the means of all possible samples of a fixed size (\(n\) elements) drawn from a population.

The Sampling Distribution of the Sample Mean will be our central object of study throughout this activity. Our goal will be to understand how this distribution is connected to:

the population distribution, and
the sample size.

How to Use The Code Blocks

You’ll encounter several large code blocks in this activity. As mentioned earlier, you are not expected to examine or understand the code in each block. Instead, focus on the plots that result each time you run the code. You are invited to change the first few lines of each block (the parameters), and you should do so! You are not expected to modify the remaining code though.

Each time you run a code block, two plots will be displayed.

Both plots show the resulting approximation of the sampling distribution via a histogram.
The plot on the left overlays a normal distribution whose mean and standard deviation match the assumed population parameters.
The plot on the right overlays a normal distribution whose mean matches the population mean, but whose standard deviation is adjusted according to the Central Limit Theorem.

A Normally Distributed Population

Work with the following code block to explore the connection between the population distribution and sampling distribution when the population follows a normal distribution. Use your explorations to answer the questions that follow.

Check Your Understanding: Normal Population I

Which of the following regarding the population and sampling distributions is true?

mutable ok_response = (response, n) => { return html`Loading...` };
viewof q1 = Inputs.radio(
  new Map([
    ["The population distribution and sampling distribution are identical in the case where sample size is 1.", 1],
    ["The population distribution and sampling distribution are always identical.", 2],
    ["The population distribution and sampling distribution are never identical.", 3]
  ]),
  {value: JSON.parse(localStorage.getItem("q1_selected_t9") ?? "null")}
);

{
  localStorage.setItem("q1_selected_t9", JSON.stringify(q1));
  localStorage.setItem("q1_correct_t9", "1");
  localStorage.setItem("q1_result_t9", q1 === null ? "unattempted" : (q1 == 1 ? "correct" : "incorrect"));
}

ok_response(q1, "1");

Check Your Understanding: Normal Population II

Try various different values for sample size. What is true about the mean of the sampling distribution?

viewof q2 = Inputs.radio(
  new Map([
    ["The larger the sample size, the smaller the mean of the sampling distribution.", 1],
    ["The mean of the sampling distribution always falls near the mean of the population distribution.", 2],
    ["The larger the sample size, the larger the mean of the sampling distribution.", 3],
    ["The mean of the population distribution and mean of the sampling distribution are independent of one another.", 4]
  ]),
  {value: JSON.parse(localStorage.getItem("q2_selected_t9") ?? "null")}
);

{
  localStorage.setItem("q2_selected_t9", JSON.stringify(q2));
  localStorage.setItem("q2_correct_t9", "2");
  localStorage.setItem("q2_result_t9", q2 === null ? "unattempted" : (q2 == 2 ? "correct" : "incorrect"));
}

ok_response(q2, "2");

Check Your Understanding: Normal Population III

Try various values of the mean, standard deviation, and sample size. What can be said about the sampling distribution?

viewof q3 = Inputs.radio(
  new Map([
    ["The sampling distribution is always the same as the population distribution.", 1],
    ["The sampling distribution is always uniform.", 2],
    ["The sampling distribution is completely unpredictable.", 3],
    ["The sampling distribution is always nearly normal.", 4]
  ]),
  {value: JSON.parse(localStorage.getItem("q3_selected_t9") ?? "null")}
);

{
  localStorage.setItem("q3_selected_t9", JSON.stringify(q3));
  localStorage.setItem("q3_correct_t9", "4");
  localStorage.setItem("q3_result_t9", q3 === null ? "unattempted" : (q3 == 4 ? "correct" : "incorrect"));
}

ok_response(q3, "4");

Check Your Understanding: Normal Population IV

Now that you’ve tried various values for the parameters, what can be said about the connection between sample size and the spread of the sampling distribution?

viewof q4 = Inputs.radio(
  new Map([
    ["The larger the sample size, the wider the sampling distribution becomes.", 1],
    ["The larger the sample size, the more narrow the sampling distribution becomes.", 2],
    ["The larger the sample size, the less normal the sampling distribution becomes.", 3],
    ["The sample size and spread of the sampling distribution are independent of one another.", 4]
  ]),
  {value: JSON.parse(localStorage.getItem("q4_selected_t9") ?? "null")}
);

{
  localStorage.setItem("q4_selected_t9", JSON.stringify(q4));
  localStorage.setItem("q4_correct_t9", "2");
  localStorage.setItem("q4_result_t9", q4 === null ? "unattempted" : (q4 == 2 ? "correct" : "incorrect"));
}

ok_response(q4, "2");

Recap: Normally Distributed Populations

Okay, so if a population distribution is approximately normal, then the sampling distribution is also nearly normal – what’s the big deal?

The real takeaway here is that, if it’s reasonable to assume that our data comes from an approximately normal population, then our sample mean is a reliable estimate of the true population mean — and the larger our sample, the more reliable that estimate becomes.

This might seem obvious, but it’s actually the foundation of everything that follows in our course. We almost never have access to an entire population, so we rely on samples. Knowing that our sample mean is a trustworthy stand-in for the population mean — and being able to quantify how trustworthy it is based on sample size — is what makes statistical inference possible.

Okay, so what if we can’t assume that the population we are interested in follows a “nearly”-normal distribution? Do you believe in “magic”? (Disclaimer: There is actually no magic, or trickery, involved in what you are about to see – just rock-solid mathematics. Get ready to have your mind blown!)

A Uniformly Distributed Population

A population is said to be uniformly distributed between some minimum value A and a maximum value B if all values between A and B are equally likely to be observed.

Work with the following code block to explore the connection between the population distribution and sampling distribution when the population follows a uniform distribution. Note that in the initial plots, assuming a normal distribution for the either the population or sampling distribution is an extremely poor choice. Use your explorations to answer the questions that follow.

Check Your Understanding: Uniform Population I

Which of the following regarding the sampling distributions is true?

viewof q5 = Inputs.radio(
  new Map([
    ["The sampling distribution is nearly normal as long as the sample size is at least 2.", 1],
    ["The sampling distribution is always normal.", 2],
    ["The sampling distribution is never nearly normal.", 3],
    ["The sampling distribution always shares a common shape with the population distribution.", 4]
  ]),
  {value: JSON.parse(localStorage.getItem("q5_selected_t9") ?? "null")}
);

{
  localStorage.setItem("q5_selected_t9", JSON.stringify(q5));
  localStorage.setItem("q5_correct_t9", "1");
  localStorage.setItem("q5_result_t9", q5 === null ? "unattempted" : (q5 == 1 ? "correct" : "incorrect"));
}

ok_response(q5, "1");

Check Your Understanding: Uniform Population II

Try various different values for sample size. What is true about the mean of the sampling distribution?

viewof q6 = Inputs.radio(
  new Map([
    ["The larger the sample size, the smaller the mean of the sampling distribution.", 1],
    ["The larger the sample size, the larger the mean of the sampling distribution.", 2],
    ["The mean of the sampling distribution always falls near the mean of the population distribution.", 3],
    ["The mean of the population distribution and mean of the sampling distribution are independent of one another.", 4]
  ]),
  {value: JSON.parse(localStorage.getItem("q6_selected_t9") ?? "null")}
);

{
  localStorage.setItem("q6_selected_t9", JSON.stringify(q6));
  localStorage.setItem("q6_correct_t9", "3");
  localStorage.setItem("q6_result_t9", q6 === null ? "unattempted" : (q6 == 3 ? "correct" : "incorrect"));
}

ok_response(q6, "3");

Check Your Understanding: Uniform Population III

Try various values for the parameters. What can be said about the connection between sample size and the spread of the sampling distribution?

viewof q7 = Inputs.radio(
  new Map([
    ["The larger the sample size, the wider the sampling distribution becomes.", 1],
    ["The larger the sample size, the less normal the sampling distribution becomes.", 2],
    ["The larger the sample size, the more narrow the sampling distribution becomes.", 3],
    ["The sample size and spread of the sampling distribution are independent of one another.", 4]
  ]),
  {value: JSON.parse(localStorage.getItem("q7_selected_t9") ?? "null")}
);

{
  localStorage.setItem("q7_selected_t9", JSON.stringify(q7));
  localStorage.setItem("q7_correct_t9", "3");
  localStorage.setItem("q7_result_t9", q7 === null ? "unattempted" : (q7 == 3 ? "correct" : "incorrect"));
}

ok_response(q7, "3");

Recap: Uniformly Distributed Populations

Given a uniformly distributed population, the sampling distribution is approximately normal for samples of size 2 or greater.

The mean of the sampling distribution is the population mean.
The spread of the sampling distribution is related to the sample size – larger samples result in more narrow distributions.

The Impact of Skew on the Sampling Distribution

We’ve seen that the sampling distribution is nearly normal for any sample size when the population is nearly normally distributed, and nearly normal for samples of size at least two when the population is uniformly distributed. What if we move in a very different direction and consider a population that is extremely skewed?

We’ve encountered skew in our course already, and we know that it describes the effect of the mean being pulled away from the center of our distribution by outliers. This witchcraft (read: mathematics) certainly cannot apply in the face of strongly skewed distributions, can it?

Work with the following code block to explore the connection between the population distribution and sampling distribution when the population follows a strongly skewed distribution. Use your explorations to answer the questions that follow.

Check Your Understanding: Skewed Population I

Try running the code with samples of size 5, 10, and 20. Are the resulting sampling distributions nearly normal?

viewof q8 = Inputs.radio(
  new Map([
    ["The sampling distribution is always normal.", 1],
    ["No, the distributions are getting closer to the assumed normal sampling distribution, but still exhibit some skew in the opposite direction as the population distribution.", 2],
    ["No, the distributions are getting closer to the assumed normal sampling distribution, but still exhibit some skew in the same direction as the population distribution.", 3]
  ]),
  {value: JSON.parse(localStorage.getItem("q8_selected_t9") ?? "null")}
);

{
  localStorage.setItem("q8_selected_t9", JSON.stringify(q8));
  localStorage.setItem("q8_correct_t9", "3");
  localStorage.setItem("q8_result_t9", q8 === null ? "unattempted" : (q8 == 3 ? "correct" : "incorrect"));
}

ok_response(q8, "3");

Check Your Understanding: Skewed Population II

Try samples of size 30, 50, and 100. What can be said about the resulting sampling distributions?

viewof q9 = Inputs.radio(
  new Map([
    ["The sampling distributions become closer to the assumed normal sampling distribution as sample size increases.", 1],
    ["There is no connection between the shape of the sampling distribution and the sample size.", 2],
    ["The sampling distribution always maintains the same shape as the population distribution.", 3]
  ]),
  {value: JSON.parse(localStorage.getItem("q9_selected_t9") ?? "null")}
);

{
  localStorage.setItem("q9_selected_t9", JSON.stringify(q9));
  localStorage.setItem("q9_correct_t9", "1");
  localStorage.setItem("q9_result_t9", q9 === null ? "unattempted" : (q9 == 1 ? "correct" : "incorrect"));
}

ok_response(q9, "1");

Check Your Understanding: Skewed Population III

Try various different values for the other parameters. What is true about the mean of the sampling distribution?

viewof q10 = Inputs.radio(
  new Map([
    ["The larger the sample size, the smaller the mean of the sampling distribution.", 1],
    ["The mean of the sampling distribution always falls near the mean of the population distribution.", 2],
    ["The larger the sample size, the larger the mean of the sampling distribution.", 3],
    ["The mean of the population distribution and mean of the sampling distribution are independent of one another.", 4]
  ]),
  {value: JSON.parse(localStorage.getItem("q10_selected_t9") ?? "null")}
);

{
  localStorage.setItem("q10_selected_t9", JSON.stringify(q10));
  localStorage.setItem("q10_correct_t9", "2");
  localStorage.setItem("q10_result_t9", q10 === null ? "unattempted" : (q10 == 2 ? "correct" : "incorrect"));
}

ok_response(q10, "2");

Check Your Understanding: Skewed Population IV

Since you’ve tried various values for the parameters, what can be said about the connection between sample size and the spread of the sampling distribution?

viewof q11 = Inputs.radio(
  new Map([
    ["The larger the sample size, the wider the sampling distribution becomes.", 1],
    ["The larger the sample size, the less normal the sampling distribution becomes.", 2],
    ["The sample size and spread of the sampling distribution are independent of one another.", 3],
    ["The larger the sample size, the more narrow the sampling distribution becomes.", 4],
  ]),
  {value: JSON.parse(localStorage.getItem("q11_selected_t9") ?? "null")}
);

{
  localStorage.setItem("q11_selected_t9", JSON.stringify(q11));
  localStorage.setItem("q11_correct_t9", "4");
  localStorage.setItem("q11_result_t9", q11 === null ? "unattempted" : (q11 == 4 ? "correct" : "incorrect"));
}

ok_response(q11, "4");

Good work through the previous sets of questions. Think about your answers and use them to answer the following questions about the connection between population distributions and sampling distributions in general.

Check Your Understanding: General I

Consider a generic population distribution. What can be said about the shape of the sampling distribution?

viewof q12 = Inputs.radio(
  new Map([
    ["The sampling distribution is always nearly normal.", 1],
    ["For large enough sample sizes, the sampling distribution is nearly normal.", 2],
    ["Nothing can be said, since we don't know what the population distribution looks like.", 3],
    ["The sampling distribution always retains the same general shape as the population distribution.", 4]
  ]),
  {value: JSON.parse(localStorage.getItem("q12_selected_t9") ?? "null")}
);

{
  localStorage.setItem("q12_selected_t9", JSON.stringify(q12));
  localStorage.setItem("q12_correct_t9", "2");
  localStorage.setItem("q12_result_t9", q12 === null ? "unattempted" : (q12 == 2 ? "correct" : "incorrect"));
}

ok_response(q12, "2");

Check Your Understanding: General II

For a generic population distribution, what can be said about the mean of the sampling distribution?

viewof q13 = Inputs.radio(
  new Map([
    ["The mean of the sampling distribution is generally a close* approximation to the mean of the population distribution.", 1],
    ["The mean of the sampling distribution is always less than the mean of the population distribution.", 2],
    ["The mean of the sampling distribution is always greater than the mean of the population distribution.", 3],
    ["The mean of the sampling distribution and mean of the population distribution are independent of one another.", 4]
  ]),
  {value: JSON.parse(localStorage.getItem("q13_selected_t9") ?? "null")}
);

{
  localStorage.setItem("q13_selected_t9", JSON.stringify(q13));
  localStorage.setItem("q13_correct_t9", "1");
  localStorage.setItem("q13_result_t9", q13 === null ? "unattempted" : (q13 == 1 ? "correct" : "incorrect"));
}

ok_response(q13, "1");

Check Your Understanding: General III

For a generic population distribution, what can be said about the connection between sample size and the spread of the sampling distribution?

viewof q14 = Inputs.radio(
  new Map([
    ["The larger the sample size, the wider the sampling distribution.", 1],
    ["The sample size and spread of the sampling distribution are independent of one another.", 2],
    ["The larger the sample size, the more narrow the sampling distribution.", 3],
  ]),
  {value: JSON.parse(localStorage.getItem("q14_selected_t9") ?? "null")}
);

{
  localStorage.setItem("q14_selected_t9", JSON.stringify(q14));
  localStorage.setItem("q14_correct_t9", "3");
  localStorage.setItem("q14_result_t9", q14 === null ? "unattempted" : (q14 == 3 ? "correct" : "incorrect"));
}

ok_response(q14, "3");

Recap: Putting It All Together

In all of these cases, the mean of the sampling distribution falls close to the mean of the population distribution — but the more important observation has to do with the spread of the sampling distribution.

We’ve been working backwards here — we assumed we know the population distribution and took many thousands of samples to construct the sampling distributions. This is exactly opposite of the real statistical scenario: we don’t know the population distribution, we can’t collect thousands of samples, and we generally have just one sample of a fixed size.

Luckily, from the experiments we’ve run (and from mathematics which has been proven to work), we know that our sample mean falls “near” the population mean, and we can describe what “near” means since it depends on the size of our sample. This is how I was able to draw those normal distributions in all the plots on the right, and it is what makes statistics work!

The Central Limit Theorem

What you just discovered through the simulations and questions above is what statisticians call the Central Limit Theorem — the result discussed in the CreatureCast video at the beginning of this activity. This is one of the most important theorems in all of statistics, and it is what will allow us to make and test claims about populations for the remainder of our course.

The Central Limit Theorem (Simplified)

Regardless of the shape of a population distribution, for sample sizes large enough to overcome skew, the distribution of sample means (the sampling distribution) is approximately normal. Furthermore:

The mean of the sampling distribution equals the population mean (\(\mu\)).
The spread of the sampling distribution is described by the standard error (\(S_E\)), which depends on the population standard deviation and the sample size.

For a population with mean \(\mu\) and standard deviation \(\sigma\), the sampling distribution of sample means from samples of size \(n\) is:

\[\bar{X}_n \sim N\left(\mu,~S_E = \frac{\sigma}{\sqrt{n}}\right)\]

Sample Problem: Suppose you have a 46 square foot wall which you want to cover with spray paint. The brand of spray paint you plan to use is known to have coverage which is approximately normal, with an average coverage of 10 square feet per can and a standard deviation of 1.5 square feet. Use the code block below to answer the questions that follow.

Check Your Understanding: Spray Paint I

How many square feet would each can need to cover if you wanted to use only four cans to cover the entire 46 square foot wall?

viewof q15 = Inputs.radio(
  new Map([
    ["10", 1],
    ["10.5", 2],
    ["11", 3],
    ["11.5", 4],
    ["12", 5]
  ]),
  {value: JSON.parse(localStorage.getItem("q15_selected_t9") ?? "null")}
);

{
  localStorage.setItem("q15_selected_t9", JSON.stringify(q15));
  localStorage.setItem("q15_correct_t9", "4");
  localStorage.setItem("q15_result_t9", q15 === null ? "unattempted" : (q15 == 4 ? "correct" : "incorrect"));
}

ok_response(q15, "4");

Check Your Understanding: Spray Paint II

What is the probability that a single can of spray paint covers at least 11.5 square feet?

viewof q16 = Inputs.radio(
  new Map([
    ["0.5000", 1],
    ["0.9772", 2],
    ["0.1587", 3],
    ["0.8413", 4],
    ["0.0228", 5]
  ]),
  {value: JSON.parse(localStorage.getItem("q16_selected_t9") ?? "null")}
);

{
  localStorage.setItem("q16_selected_t9", JSON.stringify(q16));
  localStorage.setItem("q16_correct_t9", "3");
  localStorage.setItem("q16_result_t9", q16 === null ? "unattempted" : (q16 == 3 ? "correct" : "incorrect"));
}

ok_response(q16, "3");

Check Your Understanding: Spray Paint III

What is the probability that a random sample of four cans of spray paint covers an average of at least 11.5 square feet?

viewof q17 = Inputs.radio(
  new Map([
    ["0.5000", 1],
    ["0.9772", 2],
    ["0.1587", 3],
    ["0.8413", 4],
    ["0.0228", 5]
  ]),
  {value: JSON.parse(localStorage.getItem("q17_selected_t9") ?? "null")}
);

{
  localStorage.setItem("q17_selected_t9", JSON.stringify(q17));
  localStorage.setItem("q17_correct_t9", "5");
  localStorage.setItem("q17_result_t9", q17 === null ? "unattempted" : (q17 == 5 ? "correct" : "incorrect"));
}

ok_response(q17, "5");

Check Your Understanding: Spray Paint IV

Should we expect to cover the entire wall with only four cans of spray paint, or should we plan to buy five?

viewof q18 = Inputs.radio(
  new Map([
    ["We will probably be fine with four cans — 11.5 square feet is not much more than the expected average of 10 square feet.", 1],
    ["Since 11.5 square feet is within one standard deviation of the expected value, it is likely that four will be enough.", 2],
    ["We should plan to buy five since it is unlikely (less than 5% chance) that four cans will suffice to cover the whole wall.", 3],
    ["We should plan to buy five cans — four cans will only cover 40 square feet.", 4]
  ]),
  {value: JSON.parse(localStorage.getItem("q18_selected_t9") ?? "null")}
);

{
  localStorage.setItem("q18_selected_t9", JSON.stringify(q18));
  localStorage.setItem("q18_correct_t9", "3");
  localStorage.setItem("q18_result_t9", q18 === null ? "unattempted" : (q18 == 3 ? "correct" : "incorrect"));
}

ok_response(q18, "3");

Submit

If you are part of a course with an instructor who is grading your work on these activities, please copy and submit the hash below using the method your instructor has requested.

Question Hash

The hash below encodes your responses to the multiple choice and checkbox questions in this activity.

function buildQuestionResults() {
  return {
    notebook: "Topic 9: Foundations for Inference",
    type: "questions",
    timestamp: new Date().toISOString(),
    questions: {
      q1_pop_samp_dist_identical: {
        selected: q1,
        correct_answer: "1",
        result: q1 === null ? "unattempted" : (q1 == 1 ? "correct" : "incorrect")
      },
      q2_sampling_mean_normal_pop: {
        selected: q2,
        correct_answer: "2",
        result: q2 === null ? "unattempted" : (q2 == 2 ? "correct" : "incorrect")
      },
      q3_sampling_shape_normal_pop: {
        selected: q3,
        correct_answer: "4",
        result: q3 === null ? "unattempted" : (q3 == 4 ? "correct" : "incorrect")
      },
      q4_sampling_spread_normal_pop: {
        selected: q4,
        correct_answer: "2",
        result: q4 === null ? "unattempted" : (q4 == 2 ? "correct" : "incorrect")
      },
      q5_sampling_shape_uniform_pop: {
        selected: q5,
        correct_answer: "1",
        result: q5 === null ? "unattempted" : (q5 == 1 ? "correct" : "incorrect")
      },
      q6_sampling_mean_uniform_pop: {
        selected: q6,
        correct_answer: "3",
        result: q6 === null ? "unattempted" : (q6 == 3 ? "correct" : "incorrect")
      },
      q7_sampling_spread_uniform_pop: {
        selected: q7,
        correct_answer: "3",
        result: q7 === null ? "unattempted" : (q7 == 3 ? "correct" : "incorrect")
      },
      q8_sampling_shape_skewed_small_n: {
        selected: q8,
        correct_answer: "3",
        result: q8 === null ? "unattempted" : (q8 == 3 ? "correct" : "incorrect")
      },
      q9_sampling_shape_skewed_large_n: {
        selected: q9,
        correct_answer: "1",
        result: q9 === null ? "unattempted" : (q9 == 1 ? "correct" : "incorrect")
      },
      q10_sampling_mean_skewed_pop: {
        selected: q10,
        correct_answer: "2",
        result: q10 === null ? "unattempted" : (q10 == 2 ? "correct" : "incorrect")
      },
      q11_sampling_spread_skewed_pop: {
        selected: q11,
        correct_answer: "4",
        result: q11 === null ? "unattempted" : (q11 == 4 ? "correct" : "incorrect")
      },
      q12_sampling_shape_general: {
        selected: q12,
        correct_answer: "2",
        result: q12 === null ? "unattempted" : (q12 == 2 ? "correct" : "incorrect")
      },
      q13_sampling_mean_general: {
        selected: q13,
        correct_answer: "1",
        result: q13 === null ? "unattempted" : (q13 == 1 ? "correct" : "incorrect")
      },
      q14_sampling_spread_general: {
        selected: q14,
        correct_answer: "3",
        result: q14 === null ? "unattempted" : (q14 == 3 ? "correct" : "incorrect")
      },
      q15_spray_paint_coverage_per_can: {
        selected: q15,
        correct_answer: "4",
        result: q15 === null ? "unattempted" : (q15 == 4 ? "correct" : "incorrect")
      },
      q16_spray_paint_single_can_prob: {
        selected: q16,
        correct_answer: "3",
        result: q16 === null ? "unattempted" : (q16 == 3 ? "correct" : "incorrect")
      },
      q17_spray_paint_four_can_prob: {
        selected: q17,
        correct_answer: "5",
        result: q17 === null ? "unattempted" : (q17 == 5 ? "correct" : "incorrect")
      },
      q18_spray_paint_buy_five: {
        selected: q18,
        correct_answer: "3",
        result: q18 === null ? "unattempted" : (q18 == 3 ? "correct" : "incorrect")
      }
    }
  };
}

function toBase64(str) {
  return btoa(unescape(encodeURIComponent(str)));
}

question_hash = {
  q1; q2; q3; q4; q5; q6; q7; q8; q9; q10; q11; q12; q13; q14; q15; q16; q17; q18;
  return toBase64(JSON.stringify(buildQuestionResults()));
}

html`<div style="font-family: monospace; font-size: 0.85em; background: #f5f5f5; padding: 12px; border-radius: 6px; word-break: break-all; border: 1px solid #ddd; user-select: all; cursor: pointer;" onclick="navigator.clipboard.writeText(this.innerText)">
  ${question_hash}
</div>
<p style="margin-top: 8px; font-size: 0.9em; color: #555;">
  Click the box to copy to clipboard.
</p>`

Exercise Hash

Since there were no code cell exercises in this activity, there is no exercise hash to generate.

Summary

In this activity you discovered one of the most important results in all of statistics through hands-on simulation. Here are the key takeaways and a look at what’s ahead.

Main Takeaways

The sampling distribution describes the behavior of sample means. When we take many samples of size \(n\) from a population and compute the mean of each, the resulting distribution of those means is called the sampling distribution.
The mean of the sampling distribution equals the population mean. Regardless of the population shape or sample size, the center of the sampling distribution is always close to the true population mean \(\mu\).
Larger samples produce narrower sampling distributions. The spread of the sampling distribution — measured by the standard error \(S_E = \sigma / \sqrt{n}\) — shrinks as sample size grows. Larger samples give us more precise estimates of the population mean.
This is the Central Limit Theorem. For large enough sample sizes, the sampling distribution is approximately normal, regardless of the shape of the population distribution. For normally distributed populations, any sample size works. For skewed populations, larger sample sizes (30 or more for moderate skew) are needed.
Skew increases the required sample size. The more extreme the skew in the population distribution, the larger the sample size must be before the sampling distribution becomes approximately normal.
The CLT is what makes statistical inference possible. Because we can predict the shape and spread of the sampling distribution, we can make reliable probability statements about how close a single sample mean is likely to be to the true population mean — even when we only have one sample.

Looking Ahead

Now that you understand why sample means behave predictably, the next activities will put this to work. We’ll use the Central Limit Theorem as the foundation for confidence intervals — a way of expressing how precisely a sample mean estimates the population mean — and hypothesis tests — a formal framework for deciding whether observed data is consistent with a specific claim about a population. Everything you’ve built intuition for in this activity will be essential going forward.