Topic 16: Inference Practice (Part C)
This is the final continuation of the Topic 16 inference practice series. This third of three practice notebooks contains Problems 9 through 12. No new content is introduced.
Inference Practice
This activity concludes the Topic 16 practice series. For convenience, the reference documents are again linked here:
- Standard Error Decision Tree
- General Strategy for Confidence Intervals
- General Strategy for Hypothesis Tests
For any problems involving hypothesis tests, assume \(\alpha = 0.05\) unless otherwise stated.
As with the previous two activities, there are no hints included here. Use the reference documents linked above and your previous notebooks to guide you.
Problem 9
A market researcher wants to evaluate car insurance savings at a competing company. Based on past studies, the standard deviation of savings is assumed to be $100. The researcher wants to collect data such that the margin of error is no more than $10 at a 95% confidence level. How large of a sample should be collected?
To answer the question as asked, we should:
The parameter the market researcher is attempting to measure is a:
Use the code block below to input the desired margin of error (omit the $ sign).
10
10Use the code block below to compute the critical value (\(z_{\alpha/2}\)) for a 95% confidence level.
qnorm(0.975)
qnorm(0.975)Use the code block below to input the assumed value of \(\sigma\) (omit the $ sign).
100
100Use the code block below to compute the minimum required sample size.
ceiling((qnorm(0.975) * 100 / 10)^2)
ceiling((qnorm(0.975) * 100 / 10)^2)Did you remember to round up to the nearest whole number? Rounding down violates either the confidence level or the margin of error requirement.
Problem 10
A Washington Post article from 2009 reported that “support for a government-run health-care plan to compete with private insurers has rebounded from its summertime lows and wins clear majority support from the public.” More specifically, the article says “seven in 10 Democrats back the plan, while almost nine in 10 Republicans oppose it. Independents divide 52 percent against, 42 percent in favor of the legislation.” (6% responded with “other”.) There were 819 Democrats, 566 Republicans and 783 Independents surveyed. A political pundit on TV claims that a majority of Independents oppose the health care public option plan. Do these data provide significant evidence to support this statement?
To answer the question as asked, we should:
What is the level of significance associated with this test?
Does this hypothesis test involve testing a statement about a mean (\(\mu\)), a proportion (\(p\)), or something else?
How many groups are being compared in this test?
The article mentions Democrats, Republicans, and Independents — but the claim being tested concerns only Independents. That’s a single group.
Let \(p\) denote the proportion of independents opposed to the public option. Which of the following are the hypotheses associated with this test?
Which standard error formula should be used?
Which distribution does the test statistic follow?
Use the code block below to compute the point estimate.
0.52
0.52Use the code block below to compute the null value.
0.5
0.5Use the code block below to compute the standard error.
sqrt(0.5 * (1 - 0.5) / 783)
sqrt(0.5 * (1 - 0.5) / 783)Use the code block below to compute the test statistic.
(0.52 - 0.5) / sqrt(0.5 * 0.5 / 783)
(0.52 - 0.5) / sqrt(0.5 * 0.5 / 783)Use the code block below to compute the \(p\)-value.
ts <- (0.52 - 0.5) / sqrt(0.5 * 0.5 / 783)
1 - pnorm(ts)
ts <- (0.52 - 0.5) / sqrt(0.5 * 0.5 / 783)
1 - pnorm(ts)What is the result of the test?
The result of the test means that:
Problem 11
Is there strong evidence of global warming? Let’s consider a small-scale example comparing temperatures in the US from 1968 to 2008. The daily high temperature on January 1 was collected in 1968 and 2008 for 51 randomly selected locations in the continental US. The difference between readings (2008 temperature minus 1968 temperature) was calculated for each location. The average of these 51 differences was 1.1 degrees with a standard deviation of 4.9 degrees. Conduct a hypothesis test to determine whether there is significant evidence of temperature warming.
To answer the question as asked, we should:
What is the level of significance associated with this test?
Does this hypothesis test involve testing a statement about a mean (\(\mu\)), a proportion (\(p\)), or something else?
How many groups are being compared in this test?
Are the 1968 and 2008 temperature readings paired?
The groups here are the temperatures in 1968 and the temperatures in 2008. Once we compute the temperature difference at each location (ie. \(\text{Boston}_{2008} - \text{Boston}_{1968}\)), we’ll have a single value per location. The analysis then treats these 51 differences as one sample — this is the paired data approach.
Which of the following are the hypotheses associated with this test?
Do we know the population standard deviation (\(\sigma\)) for the temperature differences?
Which standard error formula should be used?
Which distribution does the test statistic follow?
Use the code block below to compute the point estimate.
1.1
1.1Use the code block below to compute the null value.
0
0Use the code block below to compute the standard error.
4.9 / sqrt(51)
4.9 / sqrt(51)Use the code block below to compute the test statistic.
(1.1 - 0) / (4.9 / sqrt(51))
(1.1 - 0) / (4.9 / sqrt(51))Use the code block below to compute the \(p\)-value.
ts <- (1.1 - 0) / (4.9 / sqrt(51))
1 - pt(ts, df = 50)
ts <- (1.1 - 0) / (4.9 / sqrt(51))
1 - pt(ts, df = 50)What is the result of the test?
The result of the test means that:
Problem 12
A recent claim from Irving states that their average gasoline prices across the nation are $3.26 per gallon. A random sample of 15 Irving stations throughout Manchester, NH reveals an average price per gallon of $3.19 with a standard deviation of $0.035. Construct a 90% confidence interval for the average price per gallon at Irving stations in Manchester, NH and comment on your result.
To answer the question as asked, we should:
What is the desired level of confidence?
Is your confidence interval being built to capture a mean (\(\mu\)), a proportion (\(p\)), or something else?
Does the population parameter belong to a single group or is it a comparison of multiple groups?
Is the population standard deviation (\(\sigma\)) for gas prices known?
Which standard error formula should be used?
Which distribution should be used to identify the critical value?
Use the code block below to compute the critical value.
qt(0.95, df = 14)
qt(0.95, df = 14)Use the code block below to compute the point estimate.
3.19
3.19Use the code block below to compute the standard error.
0.035 / sqrt(15)
0.035 / sqrt(15)Use the code block below to compute the lower bound of the confidence interval.
3.19 - qt(0.95, df = 14) * (0.035 / sqrt(15))
3.19 - qt(0.95, df = 14) * (0.035 / sqrt(15))Use the code block below to compute the upper bound of the confidence interval.
3.19 + qt(0.95, df = 14) * (0.035 / sqrt(15))
3.19 + qt(0.95, df = 14) * (0.035 / sqrt(15))The correct interpretation of this confidence interval is:
Submit
If you are part of a course with an instructor who is grading your work on these activities, please copy and submit both of the hashes below using the method your instructor has requested.
The hash below encodes your responses to the multiple choice questions in this activity.
Click the button below to generate your exercise submission code. This hash encodes your work on the graded code exercises in this activity.
You must have attempted the graded exercises before clicking — clicking generates a snapshot of your current results. If you have completed the activity over multiple sessions, please go back through and hit the Run Code button on each graded exercise before generating the hash below, to ensure your most recent results are recorded.
Summary
A few key ideas from Problems 9–12 worth carrying forward:
- Sample size for a mean uses the formula \(n \geq (z_{\alpha/2} \cdot \sigma / ME)^2\), which requires a known or assumed population standard deviation \(\sigma\). This is in contrast to the sample size formula for a proportion, which uses \(p(1-p)\) in place of \(\sigma^2\).
- Hypothesis tests about a single proportion use the null value \(p_0\) in the standard error formula: \(S_E = \sqrt{p_0(1-p_0)/n}\). The null value, not the sample proportion, goes into the SE for a hypothesis test.
- Paired data reduces a two-sample problem to a one-sample problem — once you compute the differences, you analyze a single set of values with \(S_E = s_{\text{diff}}/\sqrt{n_{\text{diff}}}\) and df \(= n_{\text{diff}} - 1\).
- Scope of inference matters. Confidence intervals based on a sample from Manchester, NH can only be generalized to Irving stations in Manchester, NH — not all Irving stations nationally.
You’ve now completed all twelve inference practice problems across Parts A, B, and C. The next activity is a lab on inference with raw data. We’ll revisit the inference() function from the {statsr} package and focus on using it for inference on numerical variables.