This activity formally introduces confidence intervals and hypothesis tests as tools for statistical inference. We work through two motivating problems — one confidence interval and one hypothesis test — in the context of real survey data about immigration policy.
Hypothesis Testing and Confidence Intervals for Categorical Data
In this activity, we continue our exploration of statistical inference. We’ll cover the basic hypothesis testing framework in addition to discussing confidence intervals more formally than we did in the previous activity.
We’ll motivate this activity by watching three short videos from volunteers at OpenIntro.org. The first two are from Dr. David Diez, a data scientist, and the last is from Dr. Shannon McLintock, a member of the statistics faculty at Cal Poly. After each video, you’ll walk through a hands-on application of the video content to a real scenario.
Variability in Point Estimates
Our first video discusses variability in point estimates. Some of the content will sound familiar from the previous activity. Watch the video, then we’ll engage with the ideas by walking through an example together.
We’ve discussed the impossibility of a true census, so the Pew study did not poll every single American. Instead, they surveyed 9,654 US adults between June 4 and June 10, 2020. You can find out more about the study logistics here if you are interested. This means that the 74% referenced in the article is the proportion of individuals from the study who were in favor of a path to citizenship for the DREAMers.
Check Your Understanding: Terminology I
The 74% from the Pew Research article is a/an (select all that apply):
We often use sample statistic and point estimate interchangeably. A sample statistic serves as a point estimate for the corresponding population parameter.
Check Your Understanding: Terminology II
What is the parameter of interest here?
mutable ok_response = (response, n) => { returnhtml`Loading...` };viewof q2 = Inputs.radio(newMap([ ["The sample proportion of American adults who are in favor of a citizenship option for DREAMers.",1], ["The true proportion of American adults who are in favor of a citizenship option for DREAMers.",2], ["74% of American adults.",3], ["All DREAMers.",4], ["All American adults.",5] ]), {value:JSON.parse(localStorage.getItem("q2_selected") ??"null")});{ localStorage.setItem("q2_selected",JSON.stringify(q2)); localStorage.setItem("q2_correct","2"); localStorage.setItem("q2_result", q2 ===null?"unattempted": (q2 ==2?"correct":"incorrect"));}
ok_response(q2,"2");
Check Your Understanding: Terminology III
According to the methodology document, the 9,654 participants were a random sample representative of the population of American adults. If the study were completed again with a new set of 9,654 participants, we would expect:
viewof q3 = Inputs.radio(newMap([ ["A similar but slightly different result.",1], ["Exactly the same result.",2], ["A completely different result.",3], ["It is impossible to determine.",4] ]), {value:JSON.parse(localStorage.getItem("q3_selected") ??"null")});{ localStorage.setItem("q3_selected",JSON.stringify(q3)); localStorage.setItem("q3_correct","1"); localStorage.setItem("q3_result", q3 ===null?"unattempted": (q3 ==1?"correct":"incorrect"));}
ok_response(q3,"1");
The code below simulates a random sample of 9,654 individuals for which there is a 74% chance the individual is in support of a path to citizenship for DREAMers. This should look somewhat familiar if you completed the lab activity where we simulated shots taken by a basketball player in order to investigate the hot hand phenomenon. Run the code a few times to see the results.
By running the code block above multiple times, you’ve probably seen that most of the samples resulted in a sample proportion within about one percentage point (0.01) of the assumed proportion \(p = 0.74\).
In the video, Dr. Diez discusses how we can use the Central Limit Theorem to quantify how much variability we should see in the point estimate from one sample to the next. In the case of a single proportion, the Central Limit Theorem states:
Central Limit Theorem for a Sample Proportion
When observations are independent and the sample size is sufficiently large, the sample proportion \(\hat{p}\) will tend to follow a normal distribution with mean \(\mu = p\) (the true population proportion) and standard error \(\displaystyle{S_E = \sqrt{\frac{p\left(1-p\right)}{n}}}\). That is:
It is typical to assume that sufficiently large means that the success-failure condition is satisfied. The condition requires that the sample is large enough that we should expect at least 10 “successes” and at least 10 “failures”.
Use the code block below to answer the questions that follow.
Hint 1
Think about the structure of this scenario: 9,654 people are each asked a yes/no question, and they are from a random sample. Does this setup sound familiar from our earlier work on probability?
Hint 2
This is a binomial experiment. The number of “yes” responses is a binomial random variable with \(n = 9{,}654\) and \(p = 0.74\).
Hint 3
For the success-failure condition, recall how we computed expected counts for a binomial random variable. What formula gives the expected number of successes?
Hint 4
The expected number of successes is \(n \cdot p\) and the expected number of failures is \(n \cdot (1 - p)\). The success-failure condition requires both to be at least 10.
Hint 5
For the standard error question, use the formula from the Central Limit Theorem callout above. What are the values of \(p\) and \(n\) in this problem?
Hint 6
The standard error is \(\sqrt{p(1-p)/n}\). What are the values of p and n?
sqrt((___ * (1- ___)) / ___)
Hint 7 (Solved)
The standard error is \(\sqrt{p(1-p)/n}\). With \(p = 0.74\) and \(n = 9{,}654\):
sqrt((0.74* (1-0.74)) /9654)
Check Your Understanding: Success-Failure Condition I
The sample size is sufficiently large if the success-failure condition is satisfied. What is the success-failure condition? Select all that apply.
viewof q4 = Inputs.checkbox(newMap([ ["There should be at least an expected 10 observations in each group (here: in favor / not in favor).",1], ["If the population proportion is p and the sample size is n, then n⋅p ≥ 10 and n⋅(1−p) ≥ 10.",2], ["There should be a possibility that we succeed but also that we fail.",3], ["There must be at least one success and one failure.",4], ["Failure is an option.",5] ]), {value:JSON.parse(localStorage.getItem("q4_selected") ??"[]") ?? []});{ localStorage.setItem("q4_selected",JSON.stringify(q4)); localStorage.setItem("q4_correct","1,2"); localStorage.setItem("q4_result", (!q4 || q4.length===0) ?"unattempted": (q4.toString() ==="1,2"?"correct":"incorrect"));}
ok_checkbox(q4.toString(),"1,2");
Check Your Understanding: Success-Failure Condition II
Is the success-failure condition satisfied for the Pew study with 9,654 participants?
viewof q5 = Inputs.radio(newMap([ ["Yes. We should expect at least 10 participants in favor and at least 10 participants opposed.",1], ["No. We cannot expect at least 10 participants to be in favor.",2], ["No. We cannot expect at least 10 participants to oppose.",3], ["No. We cannot expect at least 10 participants in either group.",4] ]), {value:JSON.parse(localStorage.getItem("q5_selected") ??"null")});{ localStorage.setItem("q5_selected",JSON.stringify(q5)); localStorage.setItem("q5_correct","1"); localStorage.setItem("q5_result", q5 ===null?"unattempted": (q5 ==1?"correct":"incorrect"));}
ok_response(q5,"1");
Check Your Understanding: Success-Failure Condition III
Would the success-failure condition be satisfied for a small study with only 35 participants?
viewof q6 = Inputs.radio(newMap([ ["Yes. We should expect at least 10 participants in favor and at least 10 participants opposed.",1], ["No. We cannot expect at least 10 participants to be in favor.",2], ["No. We cannot expect at least 10 participants to oppose.",3], ["No. We cannot expect at least 10 participants in either group.",4] ]), {value:JSON.parse(localStorage.getItem("q6_selected") ??"null")});{ localStorage.setItem("q6_selected",JSON.stringify(q6)); localStorage.setItem("q6_correct","3"); localStorage.setItem("q6_result", q6 ===null?"unattempted": (q6 ==3?"correct":"incorrect"));}
ok_response(q6,"3");
Check Your Understanding: Shape of the Sampling Distribution
What will be the shape of the sampling distribution for samples of size 9,654?
Notice that the standard error is about half of a percentage point (close to 0.005). Doubling this estimate closely matches what we observed about the sampling error using our simulations. This brings us to our next topic — confidence intervals.
Intro to Confidence Intervals
Watch the next video from Dr. Diez. Once you’ve watched it, we’ll continue with our example about the 2020 Pew Research study.
As Dr. Diez mentions, a confidence interval can be used to capture a population parameter with some desired degree of certainty. In general, we construct a confidence interval using the following formula:
where the point estimate comes from the sample data, the critical value is related to the level of confidence, and the standard error (\(S_E\)) measures the spread of the sampling distribution.
Recall that we’ve been working with a 2020 Pew Research study which included 9,654 participants. The study resulted in 74% of participants being in favor of a path to citizenship for the DREAMers, and we computed the standard error to be approximately 0.0045.
If the sampling distribution is well-modeled by a normal distribution, the following critical values are associated with several common levels of confidence:
Confidence Level
Critical Value
90%
1.65
95%
1.96
98%
2.33
99%
2.58
Use what you learned in the video and your knowledge of the Pew Research study to answer the following questions. You can use the code block below for any necessary computations.
Hint 1
For the first three questions, re-read the paragraphs above. What are the three pieces of information needed to construct a confidence interval?
Hint 2
You now have the point estimate, standard error, and critical value. How are they used in constructing a 98% confidence interval?
Hint 3
To find the upper bound, add the margin of error to the point estimate. The margin of error is the critical value multiplied by the standard error.
#Upper Bound:___ + (___ * ___)
Hint 4
The point estimate is 0.74, the critical value is 2.33, and the standard error is about 0.0045.
#Upper Bound:___ + (___ * ___)
Hint 5
The point estimate is 0.74, the critical value is 2.33, and the standard error is about 0.0045.
Now find the lower bound.
#Upper Bound:0.74+ (2.33*0.0045)#Lower Bound:
Hint 6
The point estimate is 0.74, the critical value is 2.33, and the standard error is about 0.0045.
To find the lower bound, we subtract the margin of error from the point estimate.
The point estimate is 0.74, the critical value is 2.33, and the standard error is about 0.0045.
To find the lower bound, we subtract the margin of error from the point estimate. The margin of error is still the critical value times the standard error, and the point estimate is still 0.74.
For the final question: a 90% confidence interval uses a smaller critical value (1.65 vs. 2.33), which means a smaller margin of error and therefore a narrower interval.
Check Your Understanding: Confidence Interval I
The point estimate for our confidence interval is:
Which of the following is the correct interpretation of the 98% confidence interval?
viewof q13 = Inputs.radio(newMap([ ["We are 98% confident that the true population proportion of American adults supporting a path to citizenship for the DREAMers is between the lower bound and the upper bound.",1], ["The true population proportion of American adults supporting a path to citizenship for the DREAMers is between the lower bound and the upper bound.",2], ["The probability that the true population proportion of American adults supporting a path to citizenship for the DREAMers is between the lower bound and the upper bound is 98%.",3], ["We are 98% confident that the sample proportion of American adults supporting a path to citizenship for the DREAMers is between the lower bound and the upper bound.",4] ]), {value:JSON.parse(localStorage.getItem("q13_selected") ??"null")});{ localStorage.setItem("q13_selected",JSON.stringify(q13)); localStorage.setItem("q13_correct","1"); localStorage.setItem("q13_result", q13 ===null?"unattempted": (q13 ==1?"correct":"incorrect"));}
ok_response(q13,"1");
Check Your Understanding: Confidence Interval VI
Without computing the bounds, a 90% confidence interval would be:
viewof q14 = Inputs.radio(newMap([ ["Wider than the 98% confidence interval.",1], ["More narrow than the 98% confidence interval.",2], ["Exactly the same as the 98% confidence interval.",3], ["It is impossible to tell without computing the bounds.",4] ]), {value:JSON.parse(localStorage.getItem("q14_selected") ??"null")});{ localStorage.setItem("q14_selected",JSON.stringify(q14)); localStorage.setItem("q14_correct","2"); localStorage.setItem("q14_result", q14 ===null?"unattempted": (q14 ==2?"correct":"incorrect"));}
ok_response(q14,"2");
So far, so good! There’s one more topic to go. Sometimes we’ll want to test a claim about a population parameter rather than build a confidence interval for it. Inferential statistics provides a formal framework called the hypothesis test for evaluating statistical claims such as:
Is a population mean or proportion larger, smaller, or different from some proposed value?
Do the population means or proportions differ across multiple groups?
Intro to Hypothesis Testing
Here’s one more video from Dr. Shannon McLintock introducing the notion of the hypothesis test.
A 2018 poll from NPR reported that 65% of Americans supported a path to citizenship for DREAMers. Does the 2020 Pew Research poll provide evidence that support for a pathway to citizenship has grown over the past two years? Use a significance level of \(\alpha = 0.05\).
Use what you learned from Dr. McLintock to answer the following questions and complete the hypothesis test. You can use the code block below for any calculations.
Hint 1
In a hypothesis test, we begin with the skeptical position — the null hypothesis (\(H_0\)) — which assumes nothing has changed. What would that assumption be here?
Hint 2
The null hypothesis assumes the population proportion is still what it was in 2018. The alternative hypothesis reflects the claim being tested — that support has grown. Does this suggest a one-sided or two-sided alternative?
Hint 3
The point estimate is the sample statistic that corresponds to the population parameter being tested. What proportion did the 2020 Pew study find?
Hint 4
For the standard error, recall that in a hypothesis test we use the null value (the value assumed in \(H_0\)) in place of \(p\) in the formula \(S_E = \sqrt{p(1-p)/n}\). What is the null value here?
Hint 5
Now that you have the point estimate, null value, and standard error, you can compute the test statistic using the formula from Dr. McLintock’s video:
To compute the \(p\)-value, use pnorm(). The test statistic is your boundary value. Since the alternative hypothesis says \(p > 0.65\), which tail of the distribution corresponds to your \(p\)-value?
Hint 7 (Solved)
# Standard error (using null value p = 0.65)se <-sqrt((0.65* (1-0.65)) /9654)# Test statisticz <- (0.74-0.65) / se# p-value (upper tail, since Ha: p > 0.65)1-pnorm(z)
The \(p\)-value will be extremely small — much less than \(\alpha = 0.05\) — leading us to reject the null hypothesis.
Check Your Understanding: Hypothesis Test I
Which of the following are the hypotheses used to test this claim?
viewof q15 = Inputs.radio(newMap([ ["H₀: p = 0.65, Hₐ: p > 0.65",1], ["H₀: p = 0.65, Hₐ: p < 0.65",2], ["H₀: p = 0.65, Hₐ: p ≠ 0.65",3], ["H₀: p = 0.74, Hₐ: p > 0.74",4], ["H₀: p = 0.74, Hₐ: p < 0.74",5], ["H₀: p = 0.74, Hₐ: p ≠ 0.74",6] ]), {value:JSON.parse(localStorage.getItem("q15_selected") ??"null")});{ localStorage.setItem("q15_selected",JSON.stringify(q15)); localStorage.setItem("q15_correct","1"); localStorage.setItem("q15_result", q15 ===null?"unattempted": (q15 ==1?"correct":"incorrect"));}
The standard error is computed as \(S_E = \sqrt{\frac{p(1-p)}{n}}\), where \(p\) is the null value. Which of the following is the standard error? (round to four decimal places)
Notice that we use \(p = 0.65\) (the null value) rather than \(\hat{p} = 0.74\) (the sample proportion) when computing the standard error for the hypothesis test. This is because, during a hypothesis test, we assume the null hypothesis is true — we are asking: if the true proportion really is 0.65, how surprising is our observed sample?
Check Your Understanding: Hypothesis Test VI
The test statistic is computed as \(\displaystyle{z = \frac{(\text{point estimate}) - (\text{null value})}{S_E}}\). Which of the following is the test statistic? (round to two decimal places)
If software reports a \(p\)-value of exactly 0, this simply means the \(p\)-value is smaller than the precision the software can display. It is more accurate to say the \(p\)-value is very small, being rounded to 0, approximately 0, or to report it as \(p < 0.0001\).
Check Your Understanding: Hypothesis Test VIII
Which of the following is the conclusion of the hypothesis test?
viewof q22 = Inputs.radio(newMap([ ["The p-value is less than α, so we accept the null hypothesis.",1], ["The p-value is at least as large as α, so we reject the null hypothesis and accept the alternative.",2], ["The p-value is at least as large as α, so we do not have enough evidence to reject the null hypothesis.",3], ["The p-value is less than α, so we reject the null hypothesis and accept the alternative.",4] ]), {value:JSON.parse(localStorage.getItem("q22_selected") ??"null")});{ localStorage.setItem("q22_selected",JSON.stringify(q22)); localStorage.setItem("q22_correct","4"); localStorage.setItem("q22_result", q22 ===null?"unattempted": (q22 ==4?"correct":"incorrect"));}
ok_response(q22,"4");
Check Your Understanding: Hypothesis Test IX
Which of the following is the result of the hypothesis test stated in context?
viewof q23 = Inputs.radio(newMap([ ["We do not have evidence to suggest that the proportion of American adults in favor of a path to citizenship has increased since 2018.",1], ["We have evidence to suggest that the proportion of American adults in favor of a path to citizenship has stayed the same since 2018.",2], ["We have evidence to suggest that the proportion of American adults in favor of a path to citizenship has increased since 2018.",3], ["We do not have evidence to suggest that the proportion of American adults in favor of a path to citizenship has stayed the same since 2018.",4] ]), {value:JSON.parse(localStorage.getItem("q23_selected") ??"null")});{ localStorage.setItem("q23_selected",JSON.stringify(q23)); localStorage.setItem("q23_correct","3"); localStorage.setItem("q23_result", q23 ===null?"unattempted": (q23 ==3?"correct":"incorrect"));}
ok_response(q23,"3");
Submit
If you are part of a course with an instructor who is grading your work on these activities, please copy and submit both of the hashes below using the method your instructor has requested (there is only a question hash for this activity, no exercise hash).
Question Hash
The hash below encodes your responses to the multiple choice and checkbox questions in this activity.
Since there were no code cell exercises in this activity, there is no exercise hash to generate. You’ll see exercise hashes in future activities.
Summary
Main Takeaways
On point estimates and variability:
Sample statistics provide point estimates for their corresponding population parameters — a sample proportion estimates a population proportion, a sample mean estimates a population mean.
Sample statistics provide reliable point estimates only when the sample is representative of the population.
Every sample produces a slightly different statistic. Much of statistics is focused on quantifying this variability.
On confidence intervals:
A confidence interval captures a population parameter with a desired degree of confidence, computed as: \[(\text{point estimate}) \pm (\text{critical value}) \cdot S_E\]
The point estimate is a sample statistic. The critical value depends on the desired confidence level. The standard error (\(S_E\)) quantifies the expected variability in the point estimate.
A correct interpretation: “We are XX% confident that the true [population parameter] lies between [lower bound] and [upper bound].”
Higher confidence levels require larger critical values, which produce wider intervals.
On hypothesis tests:
A hypothesis test provides a formal framework for evaluating claims about a population parameter.
We begin with a null hypothesis (\(H_0\)) representing the status quo and an alternative hypothesis (\(H_a\)) representing the claim to be tested.
We set a significance level\(\alpha\) — the threshold below which a \(p\)-value is considered surprising enough to reject \(H_0\).
We compute a test statistic: \(\displaystyle{z = \frac{(\text{point estimate}) - (\text{null value})}{S_E}}\)
The \(p\)-value measures the probability of observing a sample at least as favorable to \(H_a\) as ours, assuming \(H_0\) is true. A \(p\)-value smaller than \(\alpha\) is taken as evidence against \(H_0\).
Looking Ahead
This activity introduced the general frameworks for confidence intervals and hypothesis tests using proportions. In the coming activities, we’ll continue to utilize these tools to help us estimate population parameters and to test claims about them.
As a preview of what’s coming, here’s a link to a Standard Error Decision Tree that we’ll use throughout the remainder of the course. It looks intimidating now, but look at the bottom-right corner — there’s the confidence interval formula you just used! And the lower-left corner shows the general test statistic formula. Everything else on the document will be explained in the coming activities.