From the Prior to the Posterior (Context-Free)

Authors

Affiliations

Adam Gilbert

Southern New Hampshire University

Laura Lambert

James Madison University

Published

September 13, 2024

Modified

December 11, 2024

Context-Free Activity

This activity is a context-free scaffold. You’ll almost surely find it to be incomplete, and that’s on purpose! We are designing this series of activities to have a domain-specific context laid over them. Throughout the activity, you’ll see purple text, which is text that should be replaced with context-specific content. A rough draft completed activity is available here.

Goals and Objectives

Statistics Goals: The statistics-focused goals of this activity are as follows:

Introduce students to the foundational aspects of Bayesian Inference as an alternative to Frequentist methods.
- In particular, students discover and examine what a parameter distribution is.
Students discover the notion and impact of the choice of a prior distribution on the resulting posterior parameter distribution.
Students discover the impact of amount of data on the resulting posterior parameter distribution.
Students experiment with how the choice of prior parameter distribution and data interact to influence the resulting posterior parameter distribution.

Course Objectives: This activity would map to course-level objectives similar to the following. Note that this is not an exhaustive list, nor should it be interpreted that objectives must be phrased identically. This is to give an idea as to the wide variety of contexts this activity might be placed in.

Students will evaluate a research question using appropriate statistical techniques
Students will correctly identify the type of data they are working with
Students will evaluate literature and/or prior research to generate hypotheses for a research question
Students will learn about different statistical models and approaches
Students will interpret coefficients from a statistical model
Students will evaluate the underlying assumptions of a statistical approach
Students will consider the ethical implications of statistical approaches
Students will gather data using methodologies appropriate to the context

Subject-Area Objectives: This section will be utilized to identify objectives/outcomes specific to the course/subject to which the activity context is linked. This allows adopters to cover objectives associated with their course while embedding Bayesian thinking.

Background Information

The following subsections outline the background information for this activity from both a statistics and domain-specific lens.

Data Analysis and Bayesian Thinking

There are many statistical tools which can be used to investigate population parameters. Broadly speaking, these tools fall into three categories:

Classical/Frequentist methods
Simulation-based methods
Bayesian methods

Perhaps you’ve encountered frequentist methods previously. These methods depend on distribution assumptions and the Central Limit Theorem. In this notebook, we’ll introduce Bayesian methods. In particular, you’ll explore how your prior belief (controlled via your choice of prior distribution) and the strength of your observed data work together to produce updated beliefs (a posterior distribution).

In Bayesian inference, we approach our tasks with some prior belief about the value of our population parameter. This is natural, because it matches our lived experience as humans. We use that prior belief, in conjunction with our data, to produce an updated version of our beliefs. Again, likely matching our individual approaches to interacting with the world we live in.

In this interactive notebook, you’ll see the foundations of the Bayesian approach to inference on a [choose the parameter relevant to the context: (population proportion / population mean)] in action. You’ll explore how your prior belief (controlled via your choice of prior distribution) and the strength of your observed data work together to produce updated beliefs.

About the Context

This subsection includes background on the domain-specific context for the activity.

Purpose

Let’s try to estimate the [population parameter in context].

Prior Assumptions

A paragraph indicating the assumptions we might come with if we had no prior expectations about our parameter. The end result here should be that we land on a uniform prior. The code chunk below sets up an uninformative uniform prior for [the population proportion being estimated]. (Note: The code to generate the distribution is shown below because you’ll have opportunities to make small updates to it in the activity on the next tab.)

alpha <- 1
beta <- 1
#Fewer points results in more jagged pictures
grid_pts <- 500

#Create prior distribution 
my_dists_prop <- tibble(
  pi = seq(0, 1, length.out = grid_pts), #possible proportion values
  prior_prob = dbeta(pi, alpha, beta) #prior probability
)

#Plot prior distribution
my_dists_prop %>%
  ggplot() + 
  geom_area(aes(x = pi, y = prior_prob), 
            fill = "purple", alpha = 0.4) + 
  geom_line(aes(x = pi, y = prior_prob),
            linetype = "dashed") + 
  labs(
    title = "Prior Distribution",
    x = "π",
    y = ""
  ) + 
  ylim(c(-0.2, 3))

Shape of Prior

The Beta-distribution we use here is commonly used with binomial data. It is determined by two shape parameters alpha (\(\alpha\)) and beta (\(\beta\)). We can think of \(\alpha\) as the number of previously observed successes and \(\beta\) as the number of previously observed failures. Choosing \(\alpha = 1\) and \(\beta = 1\), results in a prior that allows for both successes and failures (\(\pi \neq 0\) and \(\pi \neq 1\)) but which has no other certainty about the true value of \(\pi\).

Notice that the prior assumption we’ve made above (via the purple distribution) is that all proportions for the population parameter are equally plausible. In the interactive activity on the next tab, we’ll explore how newly observed data updates that prior assumption to obtain a more informed posterior understanding of the population proportion.

We’ll start with a reminder of where our pre-reading left off.

Purpose

We are trying estimate the [population parameter in context].

Prior Assumptions

alpha <- 1
beta <- 1
#Fewer points results in more jagged pictures
grid_pts <- 500

#Create prior distribution 
my_dists_prop <- tibble(
  pi = seq(0, 1, length.out = grid_pts), #possible proportion values
  prior_prob = dbeta(pi, alpha, beta) #prior probability
)

#Plot prior distribution
my_dists_prop %>%
  ggplot() + 
  geom_area(aes(x = pi, y = prior_prob), 
            fill = "purple", alpha = 0.4) + 
  geom_line(aes(x = pi, y = prior_prob),
            linetype = "dashed") + 
  labs(
    title = "Prior Distribution",
    x = "π",
    y = ""
  ) + 
  ylim(c(-0.2, 3))

Shape of Prior

Observed Data

In this section, we either generate, read, or collect data which we’ll use to update our chosen prior distribution. The code chunk below is a placeholder which simulates new observed data from our population. As mentioned, there are several options for how this section on Observed Data can be treated.

Observed data can be simulated, as in the code chunk below.

Observed data can be read in from a location on the web (for example, a GitHub repository). This could be data from a study, a publication, your own research, etc.

Observed data can be collected in class and manually input into the status column of the tibble below.

nobs <- 12
set.seed(071524)
my_data <- tibble(
  obs_id = 1:nobs,
  status = sample(c("yes", "no"), 
                  size = nobs,
                  prob = c(0.7, 0.3), 
                  replace = TRUE)
)

num_yes <- my_data %>%
  filter(status == "yes") %>%
  nrow()

print(paste0("Of the ",
             nobs,
             " observations, the number of positive responses was ",
             num_yes,
             "."))

[1] "Of the 12 observations, the number of positive responses was 8."

Obtaining the Posterior

Now we’ll use our data to update the prior distribution and obtain the posterior distribution for our population proportion. We obtain the posterior distribution by multiplying the prior by the likelihood function and then dividing by a normalizing factor to ensure that the result is a probability density (that is, the total probability is 1). The likelihood measures the probability of observing our data at each possible value of the population proportion.

\[\underbrace{\mathbb{P}\left[\text{parameter value} | \text{data}\right]}_{\text{posterior}} = \frac{\overbrace{\mathbb{P}\left[\text{data} | \text{parameter value}\right]}^{\text{likelihood}}\cdot \overbrace{\mathbb{P}\left[\text{parameter value}\right]}^{\text{prior}}}{\underbrace{\mathbb{P}\left[\text{data}\right]}_{\text{normalizing factor}}}\]

The code chunk below constructs the posterior distribution according to the Bayesian-Updating procedure above.

#Construct the posterior
my_dists_prop <- my_dists_prop %>%
  mutate(
    #Compute likelihood using binomial distribution
    likelihood = choose(nobs, num_yes)*pi^(num_yes)*(1 - pi)^(nobs - num_yes), 
    #Compute posterior as likelihood*prior
    post_prob = likelihood*prior_prob,
    #Normalize posterior
    post_prob_normalized = post_prob/(sum(post_prob)*1/grid_pts))

#Plot prior and posterior
my_dists_prop %>%
  ggplot() + 
  geom_area(aes(x = pi, y = prior_prob, fill = "prior"), 
            alpha = 0.4) + 
  geom_line(aes(x = pi, y = prior_prob),
            linetype = "dashed") + 
  geom_area(aes(x =pi, y = post_prob_normalized, fill = "posterior"), 
            alpha = 0.7) + 
  geom_line(aes(x = pi, y = post_prob_normalized),
            linetype = "dashed") +
  labs(
    title = "Prior and Posterior Distributions",
    x = "π",
    y = "",
    fill = "Distribution"
  ) +
  scale_fill_manual(values = c("prior" = "purple", "posterior" = "orange"))

Notice that, after seeing the data, our posterior estimate for the proportion of [insert context here] has been updated. [We’ll summarize what we are seeing here, or even better…ask the students to do that via a question in this location.]

Investigating Further

Now that you’ve seen one Bayesian analysis to estimate [insert population parameter here], let’s reproduce the analysis and explore the impact of (i) choice of prior, (ii) strength of data, and (iii) the combination of data and prior on the resulting posterior.

In this section, you’ll have an opportunity to reproduce the analysis above, but with a different choice of prior distribution on \(\pi\).

Update from an Expert

This box will provide students with an update, providing them with valuable information they can use to choose a more informative prior.

Recall that, in the Beta distribution, the two parameters alpha (\(\alpha\)) and beta (\(\beta\)) can be thought of as the number of prior successes and prior failures, respectively. Update the code chunk below to choose an alpha and beta that reflects the new information provided in the box above.

Once you’ve made the update, run the code to construct and visualize the prior.

Reflection Question

How does this prior compare to our original prior? Use your understanding of alpha and beta to justify what you are seeing.

Now that you’ve built the new prior, run the code chunk below to recognize your new data. Again, for the code chunk, we have the following options.

Observed data can be simulated, as in the code chunk below.

Observed data can be read in from a location on the web (for example, a GitHub repository). This could be data from a study, a publication, your own research, etc.

Observed data can be collected in class and manually input into the status column of the tibble below.

Now run the code chunk below to use the observed data to calculate and plot the updated posterior.

Reflection Question

How does this new posterior compare to the original posterior distribution we estimated? What effect has our choice of stronger prior had on the range of proportions you think our population parameter may live within? Does this result match your expectations?

In this section, you’ll have an opportunity to reproduce our original analysis using the flat prior, but with a different data source for updating. In particular, you’ll explore the impact that the number of collected observations can have on the resulting posterior distribution.

Update from an Expert

This box will provide students with an update, providing them with a description or location of newly collected data. The new data source should have many more observations than the original data source. For example, the original data source could just have been a small random sample from an existing, larger data source.

The code chunk below is set to create a weakly informative prior that assumes the additional information we received from the previous scenario, but from a small sample. For now we set a prior from 10 observations with 6 positive results, for illustration purposes.

Now that we have this prior, run the code cell below to generate the observed data. Don’t make any changes to the code just yet. As a reminder, we can set up this code chunk so that the data is

simulated, as in the code chunk below.
read in from a location on the web (for example, a GitHub repository). This could be data from a study, a publication, your own research, etc.
collected in class and manually input into the status column of the tibble below.
- This is an unlikely choice due to the relatively small sample size and time required to input data. This method could be accommodated if data is collected via a digital form and then read from a spreadsheet (ie. Google Forms/Sheets).

The code below will be replaced based on the choice made above.

Now run the code chunk below to use the observed data to calculate and plot the updated posterior.

Reflection Question

How does this new posterior compare to the original posterior distribution we estimated? What effect has our larger collection of observations (larger sample size) had on the range of proportions you think our population parameter may live within? Does this result match your expectations?

Now that you’ve seen how different priors and data (separately) result in different inferences, return to the previous section and examine how stronger priors (for example, alpha = 600 and beta = 400) interact with our data to form the resulting posterior.

Reflection Question

What conclusions can you draw about whether the prior or the observed data has greater influence over the resulting posterior?