Topic 11: Hypothesis Tests and Confidence Intervals for Categorical Data

About

This activity formally introduces confidence intervals and hypothesis tests as tools for statistical inference. We work through two motivating problems — one confidence interval and one hypothesis test — in the context of real survey data about immigration policy.

Hypothesis Testing and Confidence Intervals for Categorical Data

In this activity, we continue our exploration of statistical inference. We’ll cover the basic hypothesis testing framework in addition to discussing confidence intervals more formally than we did in the previous activity.

We’ll motivate this activity by watching three short videos from volunteers at OpenIntro.org. The first two are from Dr. David Diez, a data scientist, and the last is from Dr. Shannon McLintock, a member of the statistics faculty at Cal Poly. After each video, you’ll walk through a hands-on application of the video content to a real scenario.

Variability in Point Estimates

Our first video discusses variability in point estimates. Some of the content will sound familiar from the previous activity. Watch the video, then we’ll engage with the ideas by walking through an example together.

An Example: A June 2020 Pew Research survey revealed that 74% of Americans support offering a path to citizenship for undocumented immigrants who were brought to the US illegally as children — often referred to as DREAMers.

We’ve discussed the impossibility of a true census, so the Pew study did not poll every single American. Instead, they surveyed 9,654 US adults between June 4 and June 10, 2020. You can find out more about the study logistics here if you are interested. This means that the 74% referenced in the article is the proportion of individuals from the study who were in favor of a path to citizenship for the DREAMers.

Check Your Understanding: Terminology I

The 74% from the Pew Research article is a/an (select all that apply):

Note on Terminology

We often use sample statistic and point estimate interchangeably. A sample statistic serves as a point estimate for the corresponding population parameter.

Check Your Understanding: Terminology II

What is the parameter of interest here?

Check Your Understanding: Terminology III

According to the methodology document, the 9,654 participants were a random sample representative of the population of American adults. If the study were completed again with a new set of 9,654 participants, we would expect:

The code below simulates a random sample of 9,654 individuals for which there is a 74% chance the individual is in support of a path to citizenship for DREAMers. This should look somewhat familiar if you completed the lab activity where we simulated shots taken by a basketball player in order to investigate the hot hand phenomenon. Run the code a few times to see the results.

By running the code block above multiple times, you’ve probably seen that most of the samples resulted in a sample proportion within about one percentage point (0.01) of the assumed proportion \(p = 0.74\).

In the video, Dr. Diez discusses how we can use the Central Limit Theorem to quantify how much variability we should see in the point estimate from one sample to the next. In the case of a single proportion, the Central Limit Theorem states:

Central Limit Theorem for a Sample Proportion

When observations are independent and the sample size is sufficiently large, the sample proportion \(\hat{p}\) will tend to follow a normal distribution with mean \(\mu = p\) (the true population proportion) and standard error \(\displaystyle{S_E = \sqrt{\frac{p\left(1-p\right)}{n}}}\). That is:

\[\hat{p} \sim N\!\left(\mu = p,\ S_E = \sqrt{\frac{p\left(1-p\right)}{n}}\right)\]

It is typical to assume that sufficiently large means that the success-failure condition is satisfied. The condition requires that the sample is large enough that we should expect at least 10 “successes” and at least 10 “failures”.

Use the code block below to answer the questions that follow.

Hint 1

Think about the structure of this scenario: 9,654 people are each asked a yes/no question, and they are from a random sample. Does this setup sound familiar from our earlier work on probability?

Hint 2

This is a binomial experiment. The number of “yes” responses is a binomial random variable with \(n = 9{,}654\) and \(p = 0.74\).

Hint 3

For the success-failure condition, recall how we computed expected counts for a binomial random variable. What formula gives the expected number of successes?

Hint 4

The expected number of successes is \(n \cdot p\) and the expected number of failures is \(n \cdot (1 - p)\). The success-failure condition requires both to be at least 10.

Hint 5

For the standard error question, use the formula from the Central Limit Theorem callout above. What are the values of \(p\) and \(n\) in this problem?

Hint 6

The standard error is \(\sqrt{p(1-p)/n}\). What are the values of p and n?

sqrt((___ * (1 - ___)) / ___)
Hint 7 (Solved)

The standard error is \(\sqrt{p(1-p)/n}\). With \(p = 0.74\) and \(n = 9{,}654\):

sqrt((0.74 * (1 - 0.74)) / 9654)
Check Your Understanding: Success-Failure Condition I

The sample size is sufficiently large if the success-failure condition is satisfied. What is the success-failure condition? Select all that apply.

Check Your Understanding: Success-Failure Condition II

Is the success-failure condition satisfied for the Pew study with 9,654 participants?

Check Your Understanding: Success-Failure Condition III

Would the success-failure condition be satisfied for a small study with only 35 participants?

Check Your Understanding: Shape of the Sampling Distribution

What will be the shape of the sampling distribution for samples of size 9,654?

Check Your Understanding: Standard Error

Which of the following is the expected spread of the sampling distribution, measured by the standard error?

Notice that the standard error is about half of a percentage point (close to 0.005). Doubling this estimate closely matches what we observed about the sampling error using our simulations. This brings us to our next topic — confidence intervals.

Intro to Confidence Intervals

Watch the next video from Dr. Diez. Once you’ve watched it, we’ll continue with our example about the 2020 Pew Research study.

As Dr. Diez mentions, a confidence interval can be used to capture a population parameter with some desired degree of certainty. In general, we construct a confidence interval using the following formula:

\[\left(\text{point estimate}\right) \pm \left(\text{critical value}\right) \cdot S_E\]

where the point estimate comes from the sample data, the critical value is related to the level of confidence, and the standard error (\(S_E\)) measures the spread of the sampling distribution.

Recall that we’ve been working with a 2020 Pew Research study which included 9,654 participants. The study resulted in 74% of participants being in favor of a path to citizenship for the DREAMers, and we computed the standard error to be approximately 0.0045.

If the sampling distribution is well-modeled by a normal distribution, the following critical values are associated with several common levels of confidence:

Confidence Level Critical Value
90% 1.65
95% 1.96
98% 2.33
99% 2.58

Use what you learned in the video and your knowledge of the Pew Research study to answer the following questions. You can use the code block below for any necessary computations.

Hint 1

For the first three questions, re-read the paragraphs above. What are the three pieces of information needed to construct a confidence interval?

Hint 2

You now have the point estimate, standard error, and critical value. How are they used in constructing a 98% confidence interval?

Hint 3

To find the upper bound, add the margin of error to the point estimate. The margin of error is the critical value multiplied by the standard error.

#Upper Bound:
___ + (___ * ___)
Hint 4

The point estimate is 0.74, the critical value is 2.33, and the standard error is about 0.0045.

#Upper Bound:
___ + (___ * ___)
Hint 5

The point estimate is 0.74, the critical value is 2.33, and the standard error is about 0.0045.

Now find the lower bound.

#Upper Bound:
0.74 + (2.33 * 0.0045)

#Lower Bound: 
Hint 6

The point estimate is 0.74, the critical value is 2.33, and the standard error is about 0.0045.

To find the lower bound, we subtract the margin of error from the point estimate.

#Upper Bound:
0.74 + (2.33 * 0.0045)

#Lower Bound: 
___ - (___ * ___)
Hint 7

The point estimate is 0.74, the critical value is 2.33, and the standard error is about 0.0045.

To find the lower bound, we subtract the margin of error from the point estimate. The margin of error is still the critical value times the standard error, and the point estimate is still 0.74.

#Upper Bound:
0.74 + (2.33 * 0.0045)

#Lower Bound: 
0.74 - (2.33 * 0.0045)
Hint 8

For the final question: a 90% confidence interval uses a smaller critical value (1.65 vs. 2.33), which means a smaller margin of error and therefore a narrower interval.

Check Your Understanding: Confidence Interval I

The point estimate for our confidence interval is:

Check Your Understanding: Confidence Interval II

The standard error (\(S_E\)) is:

Check Your Understanding: Confidence Interval III

The appropriate critical value for a 98% confidence interval is:

Check Your Understanding: Confidence Interval IV

Which of the following are the bounds for a 98% confidence interval? Select all that apply.

Check Your Understanding: Confidence Interval V

Which of the following is the correct interpretation of the 98% confidence interval?

Check Your Understanding: Confidence Interval VI

Without computing the bounds, a 90% confidence interval would be:

So far, so good! There’s one more topic to go. Sometimes we’ll want to test a claim about a population parameter rather than build a confidence interval for it. Inferential statistics provides a formal framework called the hypothesis test for evaluating statistical claims such as:

  • Is a population mean or proportion larger, smaller, or different from some proposed value?
  • Do the population means or proportions differ across multiple groups?

Intro to Hypothesis Testing

Here’s one more video from Dr. Shannon McLintock introducing the notion of the hypothesis test.

A 2018 poll from NPR reported that 65% of Americans supported a path to citizenship for DREAMers. Does the 2020 Pew Research poll provide evidence that support for a pathway to citizenship has grown over the past two years? Use a significance level of \(\alpha = 0.05\).

Use what you learned from Dr. McLintock to answer the following questions and complete the hypothesis test. You can use the code block below for any calculations.

Hint 1

In a hypothesis test, we begin with the skeptical position — the null hypothesis (\(H_0\)) — which assumes nothing has changed. What would that assumption be here?

Hint 2

The null hypothesis assumes the population proportion is still what it was in 2018. The alternative hypothesis reflects the claim being tested — that support has grown. Does this suggest a one-sided or two-sided alternative?

Hint 3

The point estimate is the sample statistic that corresponds to the population parameter being tested. What proportion did the 2020 Pew study find?

Hint 4

For the standard error, recall that in a hypothesis test we use the null value (the value assumed in \(H_0\)) in place of \(p\) in the formula \(S_E = \sqrt{p(1-p)/n}\). What is the null value here?

Hint 5

Now that you have the point estimate, null value, and standard error, you can compute the test statistic using the formula from Dr. McLintock’s video:

\[\text{test statistic} = \frac{(\text{point estimate}) - (\text{null value})}{S_E}\]

Hint 6

To compute the \(p\)-value, use pnorm(). The test statistic is your boundary value. Since the alternative hypothesis says \(p > 0.65\), which tail of the distribution corresponds to your \(p\)-value?

Hint 7 (Solved)
# Standard error (using null value p = 0.65)
se <- sqrt((0.65 * (1 - 0.65)) / 9654)

# Test statistic
z <- (0.74 - 0.65) / se

# p-value (upper tail, since Ha: p > 0.65)
1 - pnorm(z)

The \(p\)-value will be extremely small — much less than \(\alpha = 0.05\) — leading us to reject the null hypothesis.

Check Your Understanding: Hypothesis Test I

Which of the following are the hypotheses used to test this claim?

Check Your Understanding: Hypothesis Test II

What is the level of significance for the test?

Check Your Understanding: Hypothesis Test III

What is the point estimate?

Check Your Understanding: Hypothesis Test IV

What is the null value?

Check Your Understanding: Hypothesis Test V

The standard error is computed as \(S_E = \sqrt{\frac{p(1-p)}{n}}\), where \(p\) is the null value. Which of the following is the standard error? (round to four decimal places)

Why Use the Null Value in the Standard Error?

Notice that we use \(p = 0.65\) (the null value) rather than \(\hat{p} = 0.74\) (the sample proportion) when computing the standard error for the hypothesis test. This is because, during a hypothesis test, we assume the null hypothesis is true — we are asking: if the true proportion really is 0.65, how surprising is our observed sample?

Check Your Understanding: Hypothesis Test VI

The test statistic is computed as \(\displaystyle{z = \frac{(\text{point estimate}) - (\text{null value})}{S_E}}\). Which of the following is the test statistic? (round to two decimal places)

Check Your Understanding: Hypothesis Test VII

Use pnorm() to compute the \(p\)-value associated with this test.

Note on Reported p-values of Zero

If software reports a \(p\)-value of exactly 0, this simply means the \(p\)-value is smaller than the precision the software can display. It is more accurate to say the \(p\)-value is very small, being rounded to 0, approximately 0, or to report it as \(p < 0.0001\).

Check Your Understanding: Hypothesis Test VIII

Which of the following is the conclusion of the hypothesis test?

Check Your Understanding: Hypothesis Test IX

Which of the following is the result of the hypothesis test stated in context?

Submit

If you are part of a course with an instructor who is grading your work on these activities, please copy and submit both of the hashes below using the method your instructor has requested (there is only a question hash for this activity, no exercise hash).

Question Hash

The hash below encodes your responses to the multiple choice and checkbox questions in this activity.

Exercise Hash

Since there were no code cell exercises in this activity, there is no exercise hash to generate. You’ll see exercise hashes in future activities.

Summary

Main Takeaways

On point estimates and variability:

  • Sample statistics provide point estimates for their corresponding population parameters — a sample proportion estimates a population proportion, a sample mean estimates a population mean.
  • Sample statistics provide reliable point estimates only when the sample is representative of the population.
  • Every sample produces a slightly different statistic. Much of statistics is focused on quantifying this variability.

On confidence intervals:

  • A confidence interval captures a population parameter with a desired degree of confidence, computed as: \[(\text{point estimate}) \pm (\text{critical value}) \cdot S_E\]
  • The point estimate is a sample statistic. The critical value depends on the desired confidence level. The standard error (\(S_E\)) quantifies the expected variability in the point estimate.
  • A correct interpretation: “We are XX% confident that the true [population parameter] lies between [lower bound] and [upper bound].”
  • Higher confidence levels require larger critical values, which produce wider intervals.

On hypothesis tests:

  • A hypothesis test provides a formal framework for evaluating claims about a population parameter.
  • We begin with a null hypothesis (\(H_0\)) representing the status quo and an alternative hypothesis (\(H_a\)) representing the claim to be tested.
  • We set a significance level \(\alpha\) — the threshold below which a \(p\)-value is considered surprising enough to reject \(H_0\).
  • We compute a test statistic: \(\displaystyle{z = \frac{(\text{point estimate}) - (\text{null value})}{S_E}}\)
  • The \(p\)-value measures the probability of observing a sample at least as favorable to \(H_a\) as ours, assuming \(H_0\) is true. A \(p\)-value smaller than \(\alpha\) is taken as evidence against \(H_0\).
Looking Ahead

This activity introduced the general frameworks for confidence intervals and hypothesis tests using proportions. In the coming activities, we’ll continue to utilize these tools to help us estimate population parameters and to test claims about them.

As a preview of what’s coming, here’s a link to a Standard Error Decision Tree that we’ll use throughout the remainder of the course. It looks intimidating now, but look at the bottom-right corner — there’s the confidence interval formula you just used! And the lower-left corner shows the general test statistic formula. Everything else on the document will be explained in the coming activities.