Topic 15: Hypothesis Tests and Confidence Intervals for Numerical Data

About

This activity introduces techniques for inference on a single population mean and for comparing two population means. The \(t\)-distributions are introduced, historical context is provided, and we work through two applications using federal sentencing data from the Southern District of New York State.

Hypothesis Testing and Confidence Intervals for Numerical Data

In this activity, we continue our exploration of statistical inference. Through the past few activities you became more comfortable with hypothesis testing and confidence intervals for categorical variables — here we extend those ideas to numerical variables. We’ll first be formally introduced to the family of \(t\)-distributions, and then work through a pair of applications to real sentencing data.

As a reminder, the Standard Error Decision Tree, General Strategy for Conducting Hypothesis Tests, and General Strategy for Constructing Confidence Intervals are all available for reference. The walkthrough video for the decision tree is below.

While watching the walkthrough video you probably noticed that the side of the decision tree corresponding to numerical data (inference for the mean \(\mu\)) is more involved than the side for proportions. Much of this stems from the fact that using a sample standard deviation as an approximation for the population standard deviation adds uncertainty to our approach. To account for this added uncertainty, we utilize a class of penalized normal distributions called the \(t\)-distributions. Watch at least one of the introductory videos below.

A Detailed Introduction:

A Shorter Introduction:

What Does the \(t\)-Distribution Look Like?

We’ve identified scenarios where we should utilize a \(t\)-distribution instead of the normal (\(z\)) distribution. The simple rule to follow: any time we use a sample standard deviation as a proxy for the population standard deviation in the standard error estimate, we should use a \(t\)-distribution.

The plot below shows a standard normal distribution in black, a \(t\)-distribution with 3 degrees of freedom in red, and a \(t\)-distribution with 12 degrees of freedom in blue.

Notice that all three distributions are bell-shaped, but the \(t\)-distributions have fatter tails than the normal distribution. Also notice that the \(t\)-distribution with 12 degrees of freedom is more similar to the normal than the one with 3 degrees of freedom. As degrees of freedom increase, the \(t\)-distribution approaches (becomes more like) the normal distribution.

Using the \(t\)-Distributions

When we introduced the normal distribution, we identified two helper functions: pnorm() to find probabilities (areas to the left of a boundary value) and qnorm() to find cutoff values (percentiles). We have analogous functions for the \(t\)-distribution:

pt(q, df) — finds the probability of falling to the left of the boundary value q in a \(t\)-distribution with df degrees of freedom.
qt(p, df) — finds the cutoff value for which the area to the left of that cutoff in a \(t\)-distribution with df degrees of freedom is p.

Note that these functions have no mean or sd parameters — we always work with standardized variables (the test statistic formula) when using the \(t\)-distributions.

Let’s practice. Don’t forget to draw pictures — you are much more likely to find errors if you skip this step.

Question 1: Find \(\mathbb{P}\left[t <\right.\) \(\left.\right]\) in a \(t\)-distribution with degrees of freedom.


pt(boundary1, df = df1)

Question 2: Find \(\mathbb{P}\left[t >\right.\) \(\left.\right]\) in a \(t\)-distribution with degrees of freedom.


1 - pt(boundary2, df = df2)

Question 3: Find the cutoff value in a \(t\)-distribution with degrees of freedom for which the area to the left of the cutoff is .


qt(area3, df = df3)

Question 4: Find the critical value associated with a % confidence interval using a \(t\)-distribution with degrees of freedom.

Hint 5

The area to the left of the critical value is the lower tail area plus the middle area. Here that is + . Use qt() with this area and the degrees of freedom.

qt(___, df = ___)


qt(1 - (1 - clevel4/100)/2, df = df4)

Applications to Criminal Sentencing

We’ll work with a dataset on Federal Sentencing from the Southern District of New York State. A subset of the data, consisting only of drug-related charges, has been loaded for you as SDNYdrug.

Question 1: Confidence Interval for Average Sentence Length

Compute a 95% confidence interval for the average sentence length for a drug-related charge in the Southern District of New York State.

Check Your Understanding: Sentencing CI, Part I

To answer the question as asked, we should:

mutable ok_response = (response, n) => { return html`Loading...` };
viewof q1 = Inputs.radio(
  new Map([
    ["Compute a probability.", 1],
    ["Construct a confidence interval.", 2],
    ["Conduct a hypothesis test.", 3],
    ["Calculate a required sample size.", 4]
  ]),
  {value: JSON.parse(localStorage.getItem("q1_selected") ?? "null")}
);

{
  localStorage.setItem("q1_selected", JSON.stringify(q1));
  localStorage.setItem("q1_correct", "2");
  localStorage.setItem("q1_result", q1 === null ? "unattempted" : (q1 == 2 ? "correct" : "incorrect"));
}

ok_response(q1, "2");

Use the code block below to compute the point estimate for average sentence length (SentenceMonths).


SDNYdrug |>
  summarize(avg_sentence = mean(SentenceMonths))

Check Your Understanding: Sentencing CI, Part II

The standard error formula is:

viewof q2 = Inputs.radio(
  new Map([
    ["SE = sqrt(p(1-p)/n)", 1],
    ["SE = sigma/sqrt(n)", 2],
    ["SE = s/sqrt(n)", 3],
    ["SE = sqrt(p1(1-p1)/n1 + p2(1-p2)/n2)", 4],
    ["SE = sqrt(s1²/n1 + s2²/n2)", 5]
  ]),
  {value: JSON.parse(localStorage.getItem("q2_selected") ?? "null")}
);

{
  localStorage.setItem("q2_selected", JSON.stringify(q2));
  localStorage.setItem("q2_correct", "3");
  localStorage.setItem("q2_result", q2 === null ? "unattempted" : (q2 == 3 ? "correct" : "incorrect"));
}

ok_response(q2, "3");

Use the code block below to compute the standard error.

Hint 4

One way is to start by piping SDNYdrug into summarize() again. We need the standard deviation and the number of observations.

SDNYdrug |>
  summarize(
    sd = ___,
    n = ___)

Hint 5

One way is to start by piping SDNYdrug into summarize() again. We need the standard deviation and the number of observations.

We can compute the standard deviation using the sd() function. We’ll need to pass the column whose standard deviation we want into the sd() function.
Similarly, we can count the number of rows with the n() function. The n() function takes no arguments.

SDNYdrug |>
  summarize(
    sd = sd(___),
    n = n())

Hint 6

One way is to start by piping SDNYdrug into summarize() again. We need the standard deviation and the number of observations.

We can compute the standard deviation using the sd() function. We’ll need to pass SentenceMonths to sd() in order to calculate its stanadard deviation.
Similarly, we can count the number of rows with the n() function. The n() function takes no arguments.

SDNYdrug |>
  summarize(
    sd = sd(SentenceMonths),
    n = n())

Hint 7 (Solved)

Now that you have the standard deviation and number of observations, you can compute the standard error. Carry out the arithmetic in a new line, dividing the discovered standard deviation by the square root of the sample size.

SDNYdrug |>
  summarize(
    sd = sd(SentenceMonths),
    n = n())

___ / sqrt(___)


sd(SDNYdrug$SentenceMonths) / sqrt(nrow(SDNYdrug))

Check Your Understanding: Sentencing CI, Part III

The distribution to be used is:

viewof q3 = Inputs.radio(
  new Map([
    ["Normal.", 1],
    ["t-distribution with df = n - 1.", 2],
    ["t-distribution with df = min(n1-1, n2-1).", 3],
    ["t-distribution with df = n_diff - 1.", 4],
    ["We should not use either distribution.", 5]
  ]),
  {value: JSON.parse(localStorage.getItem("q3_selected") ?? "null")}
);

{
  localStorage.setItem("q3_selected", JSON.stringify(q3));
  localStorage.setItem("q3_correct", "2");
  localStorage.setItem("q3_result", q3 === null ? "unattempted" : (q3 == 2 ? "correct" : "incorrect"));
}

ok_response(q3, "2");

Check Your Understanding: Sentencing CI, Part IV

The desired level of confidence is:

viewof q4 = Inputs.radio(
  new Map([
    ["90%", 1],
    ["0.95%", 2],
    ["95%", 3],
    ["98%", 4],
    ["99%", 5]
  ]),
  {value: JSON.parse(localStorage.getItem("q4_selected") ?? "null")}
);

{
  localStorage.setItem("q4_selected", JSON.stringify(q4));
  localStorage.setItem("q4_correct", "3");
  localStorage.setItem("q4_result", q4 === null ? "unattempted" : (q4 == 3 ? "correct" : "incorrect"));
}

ok_response(q4, "3");

Use the code block below to compute the critical value for the confidence interval.

Hint 3

For a 95% confidence interval, the two tails together hold 5% of the area, so each tail holds 2.5%. The area to the left of the upper critical value is \(1 - 0.025 = 0.975\).

qt(___, df = ___)

Hint 5

The critical value is the boundary/cutoff for the upper tail. How much of the area in the distribution falls below that upper tail?

The total area to the left includes the lower-tail area (0.025) plus the middle area (0.95).

qt(___, df = ___)

Hint 6

The critical value is the boundary/cutoff for the upper tail. How much of the area in the distribution falls below that upper tail?

The total area to the left includes the lower-tail area (0.025) plus the middle area (0.95). Replace the first blank with 0.025 + 0.95 or, more simply, 0.975.

qt(0.025 + 0.95, df = ___)

Hint 7

The critical value is the boundary/cutoff for the upper tail. How much of the area in the distribution falls below that upper tail?

The second blank is filled in by the degrees of freedom. Here, that’s one less than the number of observations we have.

qt(0.025 + 0.95, df = ___)

Hint 7 (Solved)

The critical value is the boundary/cutoff for the upper tail. How much of the area in the distribution falls below that upper tail?

The second blank is filled in by the degrees of freedom. Here, that’s one less than the number of observations we have. We have 280 observations, so fill the second blank with 280 - 1 or, more simply, by 279.

qt(0.025 + 0.95, df = 280 - 1)


qt(0.975, df = nrow(SDNYdrug) - 1)

A Note on the Critical Value

With so many observations, the correct critical value differed very little from 1.96 (the value used with the normal distribution). Remember that the critical values on the Standard Error Decision Tree are for the normal distribution only. Any time we use a sample standard deviation as a stand-in for the population standard deviation, we should use qt() — this will make a real difference when sample sizes are smaller.

Use the code block below to compute the lower bound of the 95% confidence interval.

Hint 3

You calculated all of these values in earlier code cells.

The point estimate (the sample average sentence length) was about 42.75 months.
The critical value is about 1.97.
The standard error is about 3.22 months.

___ - (___ * ___)

Hint 4

You calculated all of these values in earlier code cells.

The point estimate (the sample average sentence length) was about 42.75 months.
The critical value is about 1.97.
The standard error is about 3.22 months.

Fill the first blank with your point estimate. It is your best guess at the location of the population parameter.

42.75 - (___ * ___)

Hint 5

You calculated all of these values in earlier code cells.

The point estimate (the sample average sentence length) was about 42.75 months.
The critical value is about 1.97.
The standard error is about 3.22 months.

Fill the second blank with your critical value. This is the number of standard errors above or below the point estimate you’ll need to extend in order to capture the parameter with your desired level of confidence.

42.75 - (1.97 * ___)

Hint 6 (Solved)

You calculated all of these values in earlier code cells.

The point estimate (the sample average sentence length) was about 42.75 months.
The critical value is about 1.97.
The standard error is about 3.22 months.

Fill the final blank with your standard error. This measures the typical amount of sampling variability we should expect in our point estimates from one sample to the next.

42.75 - (1.97 * 3.22)


mean(SDNYdrug$SentenceMonths) - 
  qt(0.975, df = nrow(SDNYdrug) - 1) * 
  (sd(SDNYdrug$SentenceMonths) / sqrt(nrow(SDNYdrug)))

Use the code block below to compute the upper bound of the 95% confidence interval.


mean(SDNYdrug$SentenceMonths) + 
  qt(0.975, df = nrow(SDNYdrug) - 1) * 
  (sd(SDNYdrug$SentenceMonths) / sqrt(nrow(SDNYdrug)))

Check Your Understanding: Sentencing CI, Part V

Which of the following is the correct interpretation of this confidence interval?

viewof q5 = Inputs.radio(
  new Map([
    ["95% of sentence lengths for drug-related charges in the Southern District of New York State will fall between the lower and upper bounds.", 1],
    ["We are 95% confident that the average sentence length for drug-related crimes in this sample falls between the lower and upper bounds.", 2],
    ["There is a 95% chance a criminal charged with a drug-related offense will receive a sentence between the lower and upper bounds.", 3],
    ["We are 95% confident that the true average sentence length for drug-related offenders in the Southern District of New York State is between the lower and upper bounds calculated.", 4]
  ]),
  {value: JSON.parse(localStorage.getItem("q5_selected") ?? "null")}
);

{
  localStorage.setItem("q5_selected", JSON.stringify(q5));
  localStorage.setItem("q5_correct", "4");
  localStorage.setItem("q5_result", q5 === null ? "unattempted" : (q5 == 4 ? "correct" : "incorrect"));
}

ok_response(q5, "4");

Check Your Understanding: Sentencing CI, Part VI

Does this sample provide evidence to suggest that the average sentence length for drug-related charges in the Southern District of New York State exceeds three years (36 months)?

viewof q6 = Inputs.radio(
  new Map([
    ["No. The confidence interval shows that an average sentence below 36 months is plausible.", 1],
    ["Yes. The confidence interval for the average sentence length includes only values exceeding 36 months.", 2],
    ["No. Some sentence lengths are below 36 months while others exceed 36 months.", 3],
    ["It is impossible to say, since a different sample would result in a different confidence interval.", 4]
  ]),
  {value: JSON.parse(localStorage.getItem("q6_selected") ?? "null")}
);

{
  localStorage.setItem("q6_selected", JSON.stringify(q6));
  localStorage.setItem("q6_correct", "2");
  localStorage.setItem("q6_result", q6 === null ? "unattempted" : (q6 == 2 ? "correct" : "incorrect"));
}

ok_response(q6, "2");

Question 2: Hypothesis Test for Difference in Sentence Lengths

Conduct a hypothesis test at the \(\alpha = 0.10\) level of significance to determine whether the sample data provides significant evidence that the average sentence length for white offenders and the average sentence length for non-white offenders differs for drug-related cases in the Southern District of New York State.

Check Your Understanding: Sentencing HT, Part I

To answer the question as asked, we should:

viewof q7 = Inputs.radio(
  new Map([
    ["Compute a probability.", 1],
    ["Construct a confidence interval.", 2],
    ["Conduct a hypothesis test.", 3],
    ["Calculate a required sample size.", 4]
  ]),
  {value: JSON.parse(localStorage.getItem("q7_selected") ?? "null")}
);

{
  localStorage.setItem("q7_selected", JSON.stringify(q7));
  localStorage.setItem("q7_correct", "3");
  localStorage.setItem("q7_result", q7 === null ? "unattempted" : (q7 == 3 ? "correct" : "incorrect"));
}

ok_response(q7, "3");

Check Your Understanding: Sentencing HT, Part II

What is the level of significance associated with this test?

viewof q8 = Inputs.radio(
  new Map([
    ["α = 0.01", 1],
    ["α = 0.05", 2],
    ["α = 0.10", 3],
    ["The p-value.", 4]
  ]),
  {value: JSON.parse(localStorage.getItem("q8_selected") ?? "null")}
);

{
  localStorage.setItem("q8_selected", JSON.stringify(q8));
  localStorage.setItem("q8_correct", "3");
  localStorage.setItem("q8_result", q8 === null ? "unattempted" : (q8 == 3 ? "correct" : "incorrect"));
}

ok_response(q8, "3");

Check Your Understanding: Sentencing HT, Part III

Does this hypothesis test involve testing a statement about a mean (\(\mu\)), a proportion (\(p\)), or something else?

viewof q9 = Inputs.radio(
  new Map([
    ["One or more means.", 1],
    ["One or more proportions.", 2],
    ["Something else altogether.", 3]
  ]),
  {value: JSON.parse(localStorage.getItem("q9_selected") ?? "null")}
);

{
  localStorage.setItem("q9_selected", JSON.stringify(q9));
  localStorage.setItem("q9_correct", "1");
  localStorage.setItem("q9_result", q9 === null ? "unattempted" : (q9 == 1 ? "correct" : "incorrect"));
}

ok_response(q9, "1");

Check Your Understanding: Sentencing HT, Part IV

How many groups are being compared in this test?

viewof q10 = Inputs.radio(
  new Map([
    ["The test involves only a single group.", 1],
    ["The test compares two groups.", 2],
    ["The test compares more than two groups.", 3]
  ]),
  {value: JSON.parse(localStorage.getItem("q10_selected") ?? "null")}
);

{
  localStorage.setItem("q10_selected", JSON.stringify(q10));
  localStorage.setItem("q10_correct", "2");
  localStorage.setItem("q10_result", q10 === null ? "unattempted" : (q10 == 2 ? "correct" : "incorrect"));
}

ok_response(q10, "2");

Check Your Understanding: Sentencing HT, Part V

Which of the following are the hypotheses associated with this test?

viewof q11 = Inputs.radio(
  new Map([
    ["H₀: μ_white − μ_nonwhite = 0; Hₐ: μ_white − μ_nonwhite ≠ 0", 1],
    ["H₀: μ_white − μ_nonwhite = 0; Hₐ: μ_white − μ_nonwhite > 0", 2],
    ["H₀: μ_white − μ_nonwhite = 0; Hₐ: μ_white − μ_nonwhite < 0", 3],
    ["H₀: μ_white ≠ μ_nonwhite; Hₐ: μ_white = μ_nonwhite", 4]
  ]),
  {value: JSON.parse(localStorage.getItem("q11_selected") ?? "null")}
);

{
  localStorage.setItem("q11_selected", JSON.stringify(q11));
  localStorage.setItem("q11_correct", "1");
  localStorage.setItem("q11_result", q11 === null ? "unattempted" : (q11 == 1 ? "correct" : "incorrect"));
}

ok_response(q11, "1");

Check Your Understanding: Sentencing HT VI

Do we know the population standard deviations (\(\sigma\)) for sentence lengths in each group?

viewof q12 = Inputs.radio(
  new Map([
    ["Yes, we can compute the population standard deviation from the data using sd().", 1],
    ["Yes, we are provided the population standard deviation.", 2],
    ["No — sd() gives us sample standard deviations, not population standard deviations.", 3]
  ]),
  {value: JSON.parse(localStorage.getItem("q12_selected") ?? "null")}
);

{
  localStorage.setItem("q12_selected", JSON.stringify(q12));
  localStorage.setItem("q12_correct", "3");
  localStorage.setItem("q12_result", q12 === null ? "unattempted" : (q12 == 3 ? "correct" : "incorrect"));
}

ok_response(q12, "3");

Check Your Understanding: Sentencing HT, Part VII

Are the observations in the two groups (sentences handed to white offenders and sentences handed to non-white offenders) paired?

viewof q13 = Inputs.radio(
  new Map([
    ["No. There is no reason to suggest that sentences are paired.", 1],
    ["Yes. Every sentence handed to a white offender can be naturally paired with a sentence handed to a non-white offender.", 2]
  ]),
  {value: JSON.parse(localStorage.getItem("q13_selected") ?? "null")}
);

{
  localStorage.setItem("q13_selected", JSON.stringify(q13));
  localStorage.setItem("q13_correct", "1");
  localStorage.setItem("q13_result", q13 === null ? "unattempted" : (q13 == 1 ? "correct" : "incorrect"));
}

ok_response(q13, "1");

Check Your Understanding: Sentencing HT, Part VIII

Which standard error formula should be used?

viewof q14 = Inputs.radio(
  new Map([
    ["SE = s/sqrt(n)", 1],
    ["SE = sqrt(s1²/n1 + s2²/n2)", 2],
    ["SE = sqrt(sigma1²/n1 + sigma2²/n2)", 3],
    ["SE = sqrt(p(1-p)/n)", 4]
  ]),
  {value: JSON.parse(localStorage.getItem("q14_selected") ?? "null")}
);

{
  localStorage.setItem("q14_selected", JSON.stringify(q14));
  localStorage.setItem("q14_correct", "2");
  localStorage.setItem("q14_result", q14 === null ? "unattempted" : (q14 == 2 ? "correct" : "incorrect"));
}

ok_response(q14, "2");

Check Your Understanding: Sentencing HT IX

Which distribution does the test statistic follow?

viewof q15 = Inputs.radio(
  new Map([
    ["The normal distribution.", 1],
    ["The t-distribution with df = n - 1.", 2],
    ["The t-distribution with df = min(n1 - 1, n2 - 1).", 3],
    ["The t-distribution with df = n_diff - 1.", 4]
  ]),
  {value: JSON.parse(localStorage.getItem("q15_selected") ?? "null")}
);

{
  localStorage.setItem("q15_selected", JSON.stringify(q15));
  localStorage.setItem("q15_correct", "3");
  localStorage.setItem("q15_result", q15 === null ? "unattempted" : (q15 == 3 ? "correct" : "incorrect"));
}

ok_response(q15, "3");

The sentence lengths for white offenders are stored in the vector whiteSentences and for non-white offenders in the vector nonWhiteSentences. Since these objects are not data frames, you can use functionality like mean(), sd(), and length() directly on them.

Use the code blocks below to compute the necessary quantities.

Number of white offenders:


length(whiteSentences)

Number of non-white offenders:


length(nonWhiteSentences)

Average sentence length — white offenders:


mean(whiteSentences)

Average sentence length — non-white offenders:

Hint 1 (Solved)

Just like with the previous scenario, use the mean() function to calculate the average sentence length. This time, apply it to nonWhiteSentences though.

mean(nonWhiteSentences)


mean(nonWhiteSentences)

Standard deviation in sentence lengths — white offenders:


sd(whiteSentences)

Standard deviation in sentence lengths — non-white offenders:


sd(nonWhiteSentences)

Now let’s put the pieces together.

Check Your Understanding: Sentencing HT, Part X

The population parameter in question for this hypothesis test is:

viewof q16 = Inputs.radio(
  new Map([
    ["μ_white", 1],
    ["μ_nonwhite", 2],
    ["μ_white − μ_nonwhite", 3],
    ["x̄_white − x̄_nonwhite", 4]
  ]),
  {value: JSON.parse(localStorage.getItem("q16_selected") ?? "null")}
);

{
  localStorage.setItem("q16_selected", JSON.stringify(q16));
  localStorage.setItem("q16_correct", "3");
  localStorage.setItem("q16_result", q16 === null ? "unattempted" : (q16 == 3 ? "correct" : "incorrect"));
}

ok_response(q16, "3");

Check Your Understanding: Sentencing HT, Part XI

The null value is:

viewof q17 = Inputs.radio(
  new Map([
    ["0", 1],
    ["0.5", 2],
    ["14", 3],
    ["44.79", 4],
    ["39.25", 5]
  ]),
  {value: JSON.parse(localStorage.getItem("q17_selected") ?? "null")}
);

{
  localStorage.setItem("q17_selected", JSON.stringify(q17));
  localStorage.setItem("q17_correct", "1");
  localStorage.setItem("q17_result", q17 === null ? "unattempted" : (q17 == 1 ? "correct" : "incorrect"));
}

ok_response(q17, "1");

Point estimate:

Hint 2 (Solved)

The point estimate is the sample version of the parameter — subtract the mean sentence for non-white offenders from the mean sentence for white offenders.

mean(whiteSentences) - mean(nonWhiteSentences)


mean(whiteSentences) - mean(nonWhiteSentences)

Standard error:

Hint 3

The first grouping corresponds to the whiteSentences. We’ll fill in the first blank with the standard deviation for that group and the second blank with the number of observations from that group.

sqrt((36.53^2 / 37) + (___^2 / ___))

Hint 4 (Solved)

Similarly, we’ll fill in the remaining blanks with the standard deviation and number of observations from the nonWhiteSentences.

sqrt((36.53^2 / 37) + (55.88^2 / 243))


sqrt((sd(whiteSentences)^2 / length(whiteSentences)) + 
     (sd(nonWhiteSentences)^2 / length(nonWhiteSentences)))

Test statistic:

Hint 1

The test statistic formula is \(\displaystyle{\text{test statistic} = \frac{(\text{point estimate}) - (\text{null value})}{S_E}}\). You have all three components.

(___ - ___) / ___

Hint 2 (Solved)

The test statistic formula is \(\displaystyle{\text{test statistic} = \frac{(\text{point estimate}) - (\text{null value})}{S_E}}\). You have all three components.

From the earlier exercises, we know that:

the point estimate is about -14.92 months, the difference in average sentences lengths.
The null value is 0, the expected difference in average sentence lengths if no difference in the population average sentences exists.
The standard error is about 6.99 months.

(-14.92 - 0)/6.99


se <- sqrt((sd(whiteSentences)^2 / length(whiteSentences)) + 
           (sd(nonWhiteSentences)^2 / length(nonWhiteSentences)))

(mean(whiteSentences) - mean(nonWhiteSentences)) / se

Degrees of freedom:


min(length(whiteSentences), length(nonWhiteSentences)) - 1

\(p\)-value:

Hint 5

Because the test statistic is negative, pt() will calculate the area in the lower tail. Doubling this area will give us our \(p\)-value, as mentioned earlier.

2*pt(___, df = ___)


se <- sqrt((sd(whiteSentences)^2 / length(whiteSentences)) + 
           (sd(nonWhiteSentences)^2 / length(nonWhiteSentences)))
ts <- (mean(whiteSentences) - mean(nonWhiteSentences)) / se
df <- min(length(whiteSentences), length(nonWhiteSentences)) - 1

2 * (1 - pt(abs(ts), df = df))

Check Your Understanding: Sentencing HT XII

What is the result of the test?

viewof q18 = Inputs.radio(
  new Map([
    ["Since p ≥ α, we do not have enough evidence to reject the null hypothesis.", 1],
    ["Since p ≥ α, we accept the null hypothesis.", 2],
    ["Since p < α, we reject the null hypothesis and accept the alternative hypothesis.", 3],
    ["Since p < α, we fail to reject the null hypothesis.", 4],
    ["It is impossible to determine.", 5]
  ]),
  {value: JSON.parse(localStorage.getItem("q18_selected") ?? "null")}
);

{
  localStorage.setItem("q18_selected", JSON.stringify(q18));
  localStorage.setItem("q18_correct", "3");
  localStorage.setItem("q18_result", q18 === null ? "unattempted" : (q18 == 3 ? "correct" : "incorrect"));
}

ok_response(q18, "3");

Check Your Understanding: Sentencing HT XIII

The result of the test means that:

viewof q19 = Inputs.radio(
  new Map([
    ["The sample data did not provide significant evidence to suggest a difference in average sentence length for white versus non-white defendants.", 1],
    ["The sample data proved there is no difference in average sentence length for white versus non-white defendants.", 2],
    ["The sample data provided significant evidence to suggest a difference in average sentence length for white versus non-white defendants for drug-related cases in the Southern District of New York State.", 3],
    ["Sample data cannot be used to test claims about a population.", 4]
  ]),
  {value: JSON.parse(localStorage.getItem("q19_selected") ?? "null")}
);

{
  localStorage.setItem("q19_selected", JSON.stringify(q19));
  localStorage.setItem("q19_correct", "3");
  localStorage.setItem("q19_result", q19 === null ? "unattempted" : (q19 == 3 ? "correct" : "incorrect"));
}

ok_response(q19, "3");

Check Your Understanding: Sentencing HT XIV

Which of the following is an appropriate implication of our result?

viewof q20 = Inputs.radio(
  new Map([
    ["We have proved that racial bias exists in sentencing for drug-related cases in the Southern District of New York State.", 1],
    ["We have found evidence to support the hypothesis that racial bias exists in sentencing for drug-related cases in the SDNY. Our result suggests that a more formal audit of sentencing recommendations should be undertaken.", 2],
    ["We have found evidence to support the hypothesis that racial bias exists in the courts across all case types.", 3],
    ["We have found evidence to suggest that racial minorities receive more lengthy sentences in drug-related cases in the Southern District of New York State.", 4]
  ]),
  {value: JSON.parse(localStorage.getItem("q20_selected") ?? "null")}
);

{
  localStorage.setItem("q20_selected", JSON.stringify(q20));
  localStorage.setItem("q20_correct", "2");
  localStorage.setItem("q20_result", q20 === null ? "unattempted" : (q20 == 2 ? "correct" : "incorrect"));
}

ok_response(q20, "2");

Submit

If you are part of a course with an instructor who is grading your work on these activities, please copy and submit both of the hashes below using the method your instructor has requested.

Question Hash

The hash below encodes your responses to the multiple choice questions in this activity.

function buildQuestionResults() {
  return {
    notebook: "Topic 15: Hypothesis Tests and Confidence Intervals for Numerical Data",
    type: "questions",
    timestamp: new Date().toISOString(),
    questions: {
      q1_sentencing_ci_inference_type: {
        selected: q1,
        correct_answer: "2",
        result: q1 === null ? "unattempted" : (q1 == 2 ? "correct" : "incorrect")
      },
      q2_sentencing_ci_se_formula: {
        selected: q2,
        correct_answer: "3",
        result: q2 === null ? "unattempted" : (q2 == 3 ? "correct" : "incorrect")
      },
      q3_sentencing_ci_distribution: {
        selected: q3,
        correct_answer: "2",
        result: q3 === null ? "unattempted" : (q3 == 2 ? "correct" : "incorrect")
      },
      q4_sentencing_ci_confidence_level: {
        selected: q4,
        correct_answer: "3",
        result: q4 === null ? "unattempted" : (q4 == 3 ? "correct" : "incorrect")
      },
      q5_sentencing_ci_interpretation: {
        selected: q5,
        correct_answer: "4",
        result: q5 === null ? "unattempted" : (q5 == 4 ? "correct" : "incorrect")
      },
      q6_sentencing_ci_exceeds_36_months: {
        selected: q6,
        correct_answer: "2",
        result: q6 === null ? "unattempted" : (q6 == 2 ? "correct" : "incorrect")
      },
      q7_sentencing_ht_inference_type: {
        selected: q7,
        correct_answer: "3",
        result: q7 === null ? "unattempted" : (q7 == 3 ? "correct" : "incorrect")
      },
      q8_sentencing_ht_significance_level: {
        selected: q8,
        correct_answer: "3",
        result: q8 === null ? "unattempted" : (q8 == 3 ? "correct" : "incorrect")
      },
      q9_sentencing_ht_parameter_type: {
        selected: q9,
        correct_answer: "1",
        result: q9 === null ? "unattempted" : (q9 == 1 ? "correct" : "incorrect")
      },
      q10_sentencing_ht_num_groups: {
        selected: q10,
        correct_answer: "2",
        result: q10 === null ? "unattempted" : (q10 == 2 ? "correct" : "incorrect")
      },
      q11_sentencing_ht_hypotheses: {
        selected: q11,
        correct_answer: "1",
        result: q11 === null ? "unattempted" : (q11 == 1 ? "correct" : "incorrect")
      },
      q12_sentencing_ht_pop_sd_known: {
        selected: q12,
        correct_answer: "3",
        result: q12 === null ? "unattempted" : (q12 == 3 ? "correct" : "incorrect")
      },
      q13_sentencing_ht_paired: {
        selected: q13,
        correct_answer: "1",
        result: q13 === null ? "unattempted" : (q13 == 1 ? "correct" : "incorrect")
      },
      q14_sentencing_ht_se_formula: {
        selected: q14,
        correct_answer: "2",
        result: q14 === null ? "unattempted" : (q14 == 2 ? "correct" : "incorrect")
      },
      q15_sentencing_ht_distribution: {
        selected: q15,
        correct_answer: "3",
        result: q15 === null ? "unattempted" : (q15 == 3 ? "correct" : "incorrect")
      },
      q16_sentencing_ht_parameter: {
        selected: q16,
        correct_answer: "3",
        result: q16 === null ? "unattempted" : (q16 == 3 ? "correct" : "incorrect")
      },
      q17_sentencing_ht_null_value: {
        selected: q17,
        correct_answer: "1",
        result: q17 === null ? "unattempted" : (q17 == 1 ? "correct" : "incorrect")
      },
      q18_sentencing_ht_test_result: {
        selected: q18,
        correct_answer: "3",
        result: q18 === null ? "unattempted" : (q18 == 3 ? "correct" : "incorrect")
      },
      q19_sentencing_ht_conclusion_context: {
        selected: q19,
        correct_answer: "3",
        result: q19 === null ? "unattempted" : (q19 == 3 ? "correct" : "incorrect")
      },
      q20_sentencing_ht_implication: {
        selected: q20,
        correct_answer: "2",
        result: q20 === null ? "unattempted" : (q20 == 2 ? "correct" : "incorrect")
      }
    }
  };
}

function toBase64(str) {
  return btoa(unescape(encodeURIComponent(str)));
}

question_hash = {
  q1; q2; q3; q4; q5; q6; q7; q8; q9; q10;
  q11; q12; q13; q14; q15; q16; q17; q18; q19; q20;
  return toBase64(JSON.stringify(buildQuestionResults()));
}

html`<div style="font-family: monospace; font-size: 0.85em; background: #f5f5f5; padding: 12px; border-radius: 6px; word-break: break-all; border: 1px solid #ddd; user-select: all; cursor: pointer;" onclick="navigator.clipboard.writeText(this.innerText)">
  ${question_hash}
</div>
<p style="margin-top: 8px; font-size: 0.9em; color: #555;">
  Click the box to copy to clipboard.
</p>`

Exercise Hash

Click the button below to generate your exercise submission code. This hash encodes your work on the graded code exercises in this activity.

You must have attempted the graded exercises before clicking — clicking generates a snapshot of your current results. If you have completed the activity over multiple sessions, please go back through and hit the Run Code button on each graded exercise before generating the hash below, to ensure your most recent results are recorded.

Summary

Great work through all of that — and I hope you found the application to sentencing data in the SDNY both interesting and thought-provoking. The problems were scaffolded step-by-step deliberately, to help you see how each individual piece connects to the larger process of conducting inference with numerical data.

Main Takeaways

The \(t\)-distributions are a family of penalized normal distributions parameterized by degrees of freedom. As degrees of freedom increase, the \(t\)-distribution becomes closer to the normal distribution.
Use the \(t\)-distribution any time you use a sample standard deviation (\(s\)) as a substitute for the population standard deviation (\(\sigma\)) in the standard error formula. The functions pt() and qt() work exactly like pnorm() and qnorm(), but for the \(t\)-distribution.
For a single mean, the standard error is \(S_E = s/\sqrt{n}\) and the degrees of freedom is \(n - 1\).
For comparing two independent means, the standard error is \(S_E = \sqrt{s_1^2/n_1 + s_2^2/n_2}\) and the degrees of freedom is \(\min\{n_1 - 1, n_2 - 1\}\).
The confidence interval and hypothesis test formulas are the same as before — what changes is the standard error formula and the distribution used for the critical value or \(p\)-value.
Statistical significance does not imply proof. Our finding of a significant difference in sentence lengths is evidence that warrants further investigation — it does not prove racial bias, nor does it tell us the direction of the difference without examining the point estimate.

Looking Ahead

You’ve now completed the core statistical inference toolkit — confidence intervals and hypothesis tests for proportions, differences in proportions, means, and differences in means. In the coming activities, you’ll have the opportunity to practice applying these tools across a variety of mixed scenarios, building fluency in selecting the right method for each situation. The decision tree and general strategy documents will continue to be your primary guides.