Topic 12: Inference Practice
In this activity, we are introduced to a decision tree document for determining how to compute the standard error component of a confidence interval or test statistic. We are also provided with a general framework for approaching hypothesis testing and confidence interval applications, and we practice with three scaffolded examples.
Inference Practice
In the Topic 11 activity you learned about statistical inference — hypothesis tests and confidence intervals — for categorical variables. In this activity we’ll continue where we left off. You’ll start with a video explaining the Standard Error Decision Tree, and then move on to a few practice problems.
A Tour of the Decision Tree
From this point on you’ll make heavy use of the Standard Error Decision Tree. The tree may look intimidating, but it contains almost everything we’ll need to get through the majority of what remains in our course. Let’s start with a tour.
Practice Problems
This section contains three practice problems. Be sure to note any questions you have or trouble you run into. You might also check back with the Topic 11 activity to revisit the videos from Dr. Diez and Dr. McLintock, which discussed approaches similar to the ones required for these problems. The links below contain detailed versions of the general strategies for inference tasks — keep them handy!
- General Strategy for Conducting a Hypothesis Test
- General Strategy for Computing a Confidence Interval
Try It 1a: The result of a poll was that 48% of 331 randomly sampled Americans who decide not to go to college were forced into that decision because they cannot afford it. Construct a 90% confidence interval for the proportion of non college educated Americans who decided not to go to college because they cannot afford it, and interpret the result in context.
Use the code block below for any necessary computations.
What type of inference task is being requested — a confidence interval, a hypothesis test, a probability, or a required sample size?
This is a confidence interval problem. Open the General Strategy for Computing a Confidence Interval document and identify the first piece of information you need.
The first piece you need is the point estimate. What value from the problem serves as your best single estimate for the population proportion?
The point estimate is 0.48. Now consult the Standard Error Decision Tree — what formula gives the standard error for a single proportion?
The standard error formula is \(S_E = \sqrt{p(1-p)/n}\). What are the values of \(p\) and \(n\) here?
se <- sqrt(___*(1 - ___)/___)With \(p = 0.48\) and \(n = 331\), compute the standard error. Then identify the critical value for a 90% confidence interval from the table at the top of the decision tree.
se <- sqrt(0.48*(1 - 0.48)/331)Now that you have the point estimate, the critical value, and the standard error, you can construct the upper and lower bounds for the confidence interval.
se <- sqrt(0.48*(1 - 0.48)/331)
#Lower bound
___ - (___ * ___)
#Upper bound
___ + (___ * ___)The point estimate fills the first blank in both cases.
se <- sqrt(0.48*(1 - 0.48)/331)
#Lower bound
0.48 - (___ * ___)
#Upper bound
0.48 + (___ * ___)The critical value and standard error are multiplied together to obtain the margin of error.
se <- sqrt(0.48*(1 - 0.48)/331)
#Lower bound
0.48 - (1.65 * se)
#Upper bound
0.48 + (1.65 * se)To answer the question as asked, we should:
The point estimate is:
The standard error formula is:
The standard error (rounded to four decimal places) is:
The distribution to be used is:
The desired level of confidence is:
The critical value is:
The correct expression for computing the confidence interval is:
The correct lower and upper bounds for the confidence interval are:
The correct interpretation of this confidence interval is:
Does this your interval suggest that a majority of Americans choosing not to go to college have made that choice because they cannot afford it?
Try It 1b: Consider the same scenario. Suppose we wanted the margin of error for the 90% confidence interval to be about 1.5%. How large of a sample would you recommend?
Use the code block below for any necessary computations.
What type of inference task is this — a constructing a confidence interval, conducting a hypothesis test, computing a probability, or calculating a required sample size?
This is a sample size problem. Open the Standard Error Decision Tree and look near the top of the document. You should find two sample size formulas — one for means and one for proportions (each one starts with \(n \geq \cdots\)). Which formula applies here?
Since we are working with a proportion, the relevant formula is: \[n \geq \left(\frac{z_{\alpha/2}}{M_E}\right)^2 p \cdot (1 - p)\]
What is the value of \(z_{\alpha/2}\) (the critical value) for a 90% level of confidence?
((___ / ___)^2)*___*(1 - ___)The critical value for 90% confidence is \(z_{\alpha/2} = 1.65\). What is the desired margin of error \(M_E\)?
((1.65 / ___)^2)*___*(1 - ___)The critical value for 90% confidence is \(z_{\alpha/2} = 1.65\). What is the desired margin of error \(M_E\)? Be careful about units — the problem states 1.5%, so we should use 0.015.
((1.65 / 0.015)^2)*___*(1 - ___)What value should you use for \(p\)?
((1.65 / 0.015)^2)*___*(1 - ___)We have a prior estimate of \(p = 0.48\) from the sample, so we use that. If no prior estimate existed, we would use \(p = 0.5\) as a conservative choice.
((1.65 / 0.015)^2)*0.48*(1 - 0.48)Can you sample a fraction of a person?
((1.65 / 0.015)^2)*0.48*(1 - 0.48)Round up to the nearest whole number — you can never sample a fraction of a person, and rounding down would fail to achieve the desired margin of error.
((1.65 / 0.015)^2)*0.48*(1 - 0.48)To answer the question as asked, we should:
The value of \(z_{\alpha/2}\) is:
The value of the margin of error (\(M_E\)) is:
The value of \(p\) (the estimate for the proportion) should be:
Using \(p = 0.50\) is a conservative choice that maximizes the required sample size — it’s appropriate when we have no prior information about the proportion. Here, we have a sample estimate of 0.48, so we use that instead. Using 0.50 would have led us to recommend a slightly larger sample than necessary.
The required sample size is:
Try It 2: A USA Today/Gallup poll conducted between 2010 and 2011 asked a group of unemployed and underemployed Americans if they had major problems in their relationships with a spouse or close family member as a result of their employment situation. There were 27% of the 1,145 unemployed respondents and 25% of the 675 underemployed respondents who said they had experienced major relationship problems. Conduct a test to determine whether an association exists between the presence of relationship problems and employment status (unemployed vs. underemployed). Use \(\alpha = 0.05\).
Use the code block below for any necessary computations.
What type of inference task is being requested?
This is a hypothesis test. Open the General Strategy for Conducting a Hypothesis Test and work through the steps. Start by identifying the parameter of interest.
We are comparing two proportions — the proportion experiencing relationship problems among the unemployed vs. the underemployed. The parameter of interest is the difference between these two proportions: \(p_{\text{unemployed}} - p_{\text{underemployed}}\).
The question asks whether an association exists — it doesn’t specify a direction. What does that suggest about whether the alternative hypothesis is one-sided or two-sided?
Since no direction is specified, the alternative hypothesis is two-sided: \(H_a: p_{\text{unemployed}} - p_{\text{underemployed}} \neq 0\).
We’re going to need to calculate a test statistic. Doing so requires a point estimate, a null value, and a standard error. Can you find them?
We’re going to need to calculate a test statistic. Doing so requires a point estimate, a null value, and a standard error. Can you find them?
The null value is the number to the right-hand side of the equal sign in the null hypothesis (\(H_0\)). This is the assumed value of the population parameter of interest.
We’re going to need to calculate a test statistic. Doing so requires a point estimate, a null value, and a standard error. Can you find them?
The null value is 0.
We’re going to need to calculate a test statistic. Doing so requires a point estimate, a null value, and a standard error. Can you find them?
The null value is 0.
The point estimate is the sample-version of the expression on the left hand side of your null hypothesis. That is, the point estimate is \(\hat{p}_{\text{unemployed}} - \hat{p}_{\text{underemployed}}\).
We’re going to need to calculate a test statistic. Doing so requires a point estimate, a null value, and a standard error. Can you find them?
The null value is 0.
So the point estimate is
\[\begin{align} \hat{p}_{\text{unemployed}} - \hat{p}_{\text{underemployed}} &= 0.27 - 0.25\\ &= 0.02\end{align}\]
We’re going to need to calculate a test statistic. Doing so requires a point estimate, a null value, and a standard error. Can you find them?
The null value is 0.
So the point estimate is 0.02
Since we’re working with proportions and comparing two groups, what is the formula for computing the standard error?
We’re going to need to calculate a test statistic. Doing so requires a point estimate, a null value, and a standard error. Can you find them?
The null value is 0.
So the point estimate is 0.02
Using the tree, the standard error is
\[S_E = \sqrt{\frac{p1\left(1 - p_1\right)}{n_1} + \frac{p_2\left(1 - p_2\right)}{n_2}}\]
To answer the question as asked, we should:
The hypotheses associated with this test are:
The null value is:
The point estimate is:
How many groups are being compared in this study?
The standard error formula is:
Compute the standard error in the code block below. Be sure to carry at least five decimal places in any intermediate calculations.
From the tree, the standard error formula for a comparison between two population proportions is: \[S_E = \sqrt{\frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}}\]
se <- sqrt((___*(1 - ___) / ___) + (___*(1 - ___) / ___))
seFrom the tree, the standard error formula for a comparison between two population proportions is: \[S_E = \sqrt{\frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}}\]
The values \(p_1\) and \(n_1\) are the proportion and sample size from the first group respectively.
se <- sqrt((___*(1 - ___) / ___) + (___*(1 - ___) / ___))
seFrom the tree, the standard error formula for a comparison between two population proportions is: \[S_E = \sqrt{\frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}}\]
The values \(p_1\) and \(n_1\) are the proportion and sample size from the first group respectively. This means that \(p_1 = 0.27\) and \(n_1 = 1145\).
Can you identify \(p_2\) and \(n_2\) similarly?
se <- sqrt((0.27*(1 - 0.27) / 1145) + (___*(1 - ___) / ___))
seFrom the tree, the standard error formula for a comparison between two population proportions is: \[S_E = \sqrt{\frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}}\]
The values \(p_1\) and \(n_1\) are the proportion and sample size from the first group respectively. This means that \(p_1 = 0.27\) and \(n_1 = 1145\).
Similarly, \(p_2 = 0.25\) and \(n_2 = 675\).
se <- sqrt((0.27*(1 - 0.27) / 1145) + (0.25*(1 - 0.25) / 675))
se
sqrt((0.27 * 0.73 / 1145) + (0.25 * 0.75 / 675))
sqrt((0.27 * 0.73 / 1145) + (0.25 * 0.75 / 675))Now use the code block below to compute the test statistic.
You’ve got the point estimate, the null value, and the standard error now. That’s all the ingredients required to construct the test statistic.
se <- sqrt((0.27*(1 - 0.27) / 1145) + (0.25*(1 - 0.25) / 675))
seYou’ve got the point estimate, the null value, and the standard error now. That’s all the ingredients required to construct the test statistic.
\[\text{test statistic} = \frac{\left(\text{point estimate}\right) - \left(\text{null value}\right)}{S_E}\]
se <- sqrt((0.27*(1 - 0.27) / 1145) + (0.25*(1 - 0.25) / 675))
se
test_stat <- (___ - ___) / ___
test_statYou’ve got the point estimate, the null value, and the standard error now. That’s all the ingredients required to construct the test statistic.
\[\text{test statistic} = \frac{\left(\text{point estimate}\right) - \left(\text{null value}\right)}{S_E}\]
The point estimate is 0.02.
se <- sqrt((0.27*(1 - 0.27) / 1145) + (0.25*(1 - 0.25) / 675))
se
test_stat <- (0.02 - ___) / se
test_statYou’ve got the point estimate, the null value, and the standard error now. That’s all the ingredients required to construct the test statistic.
\[\text{test statistic} = \frac{\left(\text{point estimate}\right) - \left(\text{null value}\right)}{S_E}\]
The null value is 0.
se <- sqrt((0.27*(1 - 0.27) / 1145) + (0.25*(1 - 0.25) / 675))
se
test_stat <- (0.02 - 0) / se
test_stat
se <- sqrt((0.27 * 0.73 / 1145) + (0.25 * 0.75 / 675))
(0.27 - 0.25) / se
se <- sqrt((0.27 * 0.73 / 1145) + (0.25 * 0.75 / 675))
(0.27 - 0.25) / seThe distribution to be used is:
Now compute the \(p\)-value in the code block below.
Now that we have the test statistic, we need to obtain the \(p\)-value. Remember, the \(p\)-value represents the probability of observing data at least as favorable to \(H_a\) as our sample data, under the assumption that the null hypothesis (\(H_0\)) is true.
se <- sqrt((0.27*(1 - 0.27) / 1145) + (0.25*(1 - 0.25) / 675))
se
test_stat <- (0.02 - 0) / se
test_stat
p_val <- ___
p_valNow that we have the test statistic, we need to obtain the \(p\)-value. Remember, the \(p\)-value represents the probability of observing data at least as favorable to \(H_a\) as our sample data, under the assumption that the null hypothesis (\(H_0\)) is true.
Our alternative hypothesis used a not equal to symbol (\(\neq\)), which indicates that the test is two-tailed. Because of this, we’ll multiply our calculated tail area by 2.
se <- sqrt((0.27*(1 - 0.27) / 1145) + (0.25*(1 - 0.25) / 675))
se
test_stat <- (0.02 - 0) / se
test_stat
p_val <- 2*(___)
p_valNow that we have the test statistic, we need to obtain the \(p\)-value. Remember, the \(p\)-value represents the probability of observing data at least as favorable to \(H_a\) as our sample data, under the assumption that the null hypothesis (\(H_0\)) is true.
Our alternative hypothesis used a not equal to symbol (\(\neq\)), which indicates that the test is two-tailed. Because of this, we’ll multiply our calculated tail area by 2.
Because the test statistic is positive, the tail area is in the upper tail. This means that we’ll need to use 1 - pnorm() to find that area.
se <- sqrt((0.27*(1 - 0.27) / 1145) + (0.25*(1 - 0.25) / 675))
se
test_stat <- (0.02 - 0) / se
test_stat
p_val <- 2*(1 - pnorm(___, ___, ___))
p_valNow that we have the test statistic, we need to obtain the \(p\)-value. Remember, the \(p\)-value represents the probability of observing data at least as favorable to \(H_a\) as our sample data, under the assumption that the null hypothesis (\(H_0\)) is true.
Evaluating the test statistic formula moves us to the standard normal distribution. That’s the normal distribution with \(\mu = 0\) and \(\sigma = 1\).
se <- sqrt((0.27*(1 - 0.27) / 1145) + (0.25*(1 - 0.25) / 675))
se
test_stat <- (0.02 - 0) / se
test_stat
p_val <- 2*(1 - pnorm(___, 0, 1))
p_valNow that we have the test statistic, we need to obtain the \(p\)-value. Remember, the \(p\)-value represents the probability of observing data at least as favorable to \(H_a\) as our sample data, under the assumption that the null hypothesis (\(H_0\)) is true.
Evaluating the test statistic formula moves us to the standard normal distribution. That’s the normal distribution with \(\mu = 0\) and \(\sigma = 1\).
The test statistic is the boundary value.
se <- sqrt((0.27*(1 - 0.27) / 1145) + (0.25*(1 - 0.25) / 675))
se
test_stat <- (0.02 - 0) / se
test_stat
p_val <- 2*(1 - pnorm(test_stat, 0, 1))
p_val
se <- sqrt((0.27 * 0.73 / 1145) + (0.25 * 0.75 / 675))
z <- (0.27 - 0.25) / se
2 * (1 - pnorm(abs(z)))
se <- sqrt((0.27 * 0.73 / 1145) + (0.25 * 0.75 / 675))
z <- (0.27 - 0.25) / se
2 * (1 - pnorm(abs(z)))The result of the test is:
The result of the test stated in context means that:
Good work through these practice problems! Remember to download or print copies of the Standard Error Decision Tree, the General Strategy for Conducting a Hypothesis Test, and the General Strategy for Constructing a Confidence Interval. You’ll refer back to these documents often as you gain familiarity and confidence with statistical inference.
Submit
If you are part of a course with an instructor who is grading your work on these activities, please copy and submit both of the hashes below using the method your instructor has requested.
The hash below encodes your responses to the multiple choice and checkbox questions in this activity.
Click the button below to generate your exercise submission code. This hash encodes your work on the graded code exercises in this activity.
You must have attempted the graded exercises before clicking — clicking generates a snapshot of your current results. If you have completed the activity over multiple sessions, please go back through and hit the Run Code button on each graded exercise before generating the hash below, to ensure your most recent results are recorded.
Summary
- The Standard Error Decision Tree is your primary reference for identifying which standard error formula applies to a given inference problem. The key questions are: Are you working with means or proportions? One group or two?
- Confidence intervals and hypothesis tests follow a structured process. The general strategy documents walk you through each step — use them until the process becomes second nature.
- Sample size planning allows you to determine how large a sample is needed to achieve a desired margin of error before data collection begins. Using a prior estimate of the proportion (when available) gives a more efficient answer than defaulting to \(p = 0.50\).
- Two-proportion hypothesis tests compare the difference in proportions across two groups. The null hypothesis is typically that the difference is zero, and the standard error formula accounts for both groups’ sample sizes and proportions.
- When a \(p\)-value exceeds \(\alpha\), we do not have sufficient evidence to reject the null hypothesis — but this is not the same as proving the null is true. Absence of evidence is not evidence of absence.
The inference framework you’ve been practicing here — identify the parameter, state hypotheses, compute a test statistic or confidence interval, draw a conclusion — will carry forward unchanged throughout the rest of the course. What will change is the specific standard error formula and distribution used, depending on whether you’re working with proportions or means, one group or two. The decision tree will guide those choices. In the coming activities, we’ll extend these ideas to numerical data and the \(t\)-distribution.