Topic 5: Discrete Probability Distributions (Dynamic)

About

This activity provides an introduction to discrete probability, including basic probability through counting outcomes, and calculating probabilities associated with outcomes of binomial experiments using the binomial distribution.

Discrete Probability Distributions

Throughout this activity you’ll be introduced to the notion of probability and will explore applications of probability and discrete random variables. After developing some intuition using foundational probability ideas, we’ll focus on binomial experiments and using the binomial distribution to find probabilities of prescribed outcomes.

Limitations

There are entire courses devoted to probability – we will only cover probability to the extent that it is necessary for use in this course. If you are interested in a more detailed treatment of probability, seek out one of the many great courses available.

Objectives

Activity Objectives: After completing this workbook you should be able to:

  • Define, discuss, and interpret the probability of an event as its likelihood.
  • Apply fundamental counting principles and the notion of independence to compute the probability associated with the occurrence of a sequence of events.
  • Use the definition of binomial experiments to identify scenarios to which the binomial distribution can be applied.
  • Apply the binomial distribution in appropriate scenarios to find probabilities associated with specified outcomes.
  • Given a binomial experiment, compute the expected number of successful outcomes as well as the standard deviation for number of successes.

Basic Probability

Definition of Probability (frequentist): For a given random process, the probability of an event \(A\) is the proportion of time we would observe outcome \(A\) if the random process were repeated an infinite number of times.

Example: Given a fair coin, the probability of a flip turning up heads is \(0.5\) (or 50%). Similarly, given a fair six-sided die, the probability of a roll resulting in a number greater than four is \(1/3\) (or about 33.3%) because there are two outcomes satisfying the criteria (rolling a 5 or rolling a 6) out of the six total possible outcomes.

Try It! Now it is your turn. Try the next few problems. Be sure to note any questions you have as you work through them.

Check Your Understanding: Basic Probability I

Given one fair, six-sided die, what is the probability of rolling a three?

Check Your Understanding: Basic Probability II

Given one fair, six-sided die, what is the probability of rolling a two, four, or six?

Check Your Understanding: Basic Probability III

Given two fair, six-sided dice, which is larger?

Good work on that last set of questions. In those problems you could find the probability by counting the number of ways the desired outcome could occur and then dividing that number by the total number of outcomes possible. In the last question, there were simply more ways to roll a five (four ways to do it) than to roll a two (just one way). What if we try doing something a bit more complicated? Say we wanted to know the probability of rolling at least a two on a single roll of a die and then flipping a “tails” on a single flip of a coin?

Probability and Independent Events

If \(A\) and \(B\) are independent events (that is, the probability that \(B\) occurs does not depend on whether or not \(A\) occurred, and vice-versa), then the probability of \(A\) and \(B\) occurring is the product of the probability of \(A\) occurring and the probability of \(B\) occurring. Mathematically, we write: \(\mathbb{P}\left[A~\text{and}~B\right] = \mathbb{P}\left[A\right]\cdot\mathbb{P}\left[B\right]\).

Check Your Understanding: Probability and Independent Events I

Given a single roll of a fair, six-sided die, what is the probability of rolling at least a two?

Check Your Understanding: Probability and Independent Events II

Given a single flip of a fair coin, what is the probability of the coin landing with tails facing upwards?

Check Your Understanding: Probability and Independent Events III

Use the code block below to compute the probability that in a single roll of a fair die and a flip of a coin we observe a roll of at least two and a flip of tails.

Hint 1

Use the probabilities you identified above.

Hint 2

Use the probabilities you identified above. The flip of a coin and roll of a fair die are independent events.

Hint 3

Since the events are independent, we can multiply the probabilities of the individual outcomes together.

(___) * (___)
Hint 4 (Solved)

Since the events are independent, we can multiply the probabilities of the individual outcomes together.

(5/6) * (1/2)
(5/6)*(1/2)

(5/6)*(1/2)

Good work so far. Let’s say you forgot to study for your chemistry quiz today. It is a four question multiple choice quiz with answer options \(a)\) through \(e)\) on each question. You decide that your best option is to guess randomly on each of the questions. Answer the following, using the empty code block below to carry out any necessary computations.

Hint 1 (Guessing, Part I)

There are five answer options possible and only one of them is correct.

Hint 2 (Guessing, Part I)

There are five answer options possible and only one of them is correct. You’re guessing randomly, so no one choice is more likely than any of the others.

Hint 3 (Guessing, Part I)

There are five answer options possible and only one of them is correct. You’re guessing randomly, so no one choice is more likely than any of the others. By choosing randomly, you have a one out of five (20% or 0.20) probability of selecting the correct answer on any one question.

Hint 1 (Guessing, Part II)

The questions are independent events here.

Hint 2 (Guessing, Part II)

The questions are independent events here. Multiply the probability associated with the outcome on each individual question together.

(___)*(___)*(___)*(___)
Hint 3 (Guessing, Part II)

The questions are independent events here. Multiply the probability associated with the outcome on each individual question together. The probability of getting any individual question correct is 0.20.

(0.20)*(0.20)*(0.20)*(0.20)
Hint 1 (Guessing, Part III)

Use the same approach here, but now every question is being answered incorrectly.

(___)*(___)*(___)*(___)
Hint 2 (Guessing, Part III)

If the probability of getting any individual question correct is 0.20, then the probability of getting it wrong must be 0.80.

(___)*(___)*(___)*(___)
Hint 3 (Guessing, Part III)

If the probability of getting any individual question correct is 0.20, then the probability of getting it wrong must be 0.80.

(0.80)*(0.80)*(0.80)*(0.80)
Hint 1 (Guessing, Part IV)

We’ll start with the same setup as for the previous two parts.

(___)*(___)*(___)*(___)
Hint 2 (Guessing, Part IV)

We don’t care about the result to question 4. We could get it right or get it wrong, and it makes no difference to whether or not our even of interest occurs.

(___)*(___)*(___)*(___)
Hint 3 (Guessing, Part IV)

We don’t care about the result to question 4. We could get it right or get it wrong, and it makes no difference to whether or not our even of interest occurs. Since, on the fourth question, our event of interest is “getting the question right or wrong”, the probability of that outcome is 100% (or 1), since no other outcome is possible.

(___)*(___)*(___)*(1)
Hint 4 (Guessing, Part IV)

If the first question that we get right is question 3, what must have been the outcome for each of the first two questions?

(___)*(___)*(___)*(1)
Hint 4 (Guessing, Part IV)

If the first question that we get right is question 3, what must have been the outcome for each of the first two questions? We must have gotten both of the first two questions wrong. The probability of getting any individual question wrong was 0.80.

(0.80)*(0.80)*(___)*(1)
Hint 5 (Guessing, Part IV)

We must have gotten the third question right. The probability of guessing correctly was 0.20.

(0.80)*(0.80)*(0.20)*(1)
Hint 1 (Guessing, Part V)

Will the approach we’ve taken to each of the previous three questions work here? Why or why not?

(___)*(___)*(___)*(___)
Check Your Understanding: Guessing on a Quiz I

For a single question, what is the probability that you get that question correct?

Check Your Understanding: Guessing on a Quiz II

What is the probability that you get every one of the questions correct?

Check Your Understanding: Guessing on a Quiz III

What is the probability that you get every one of the questions wrong?

Check Your Understanding: Guessing on a Quiz IV

What is the probability that the first one you get wrong is question three?

Check Your Understanding: Guessing on a Quiz V

What is the probability that you get exactly two questions right?

So in the last question, none of the choices were correct – but why? There are lots of ways that we could get two of the questions right. We could get the first two right, the first and last right, the middle two right, and more! We need to account for all of these possibilities.

Binomial Experiments and the Binomial Distribution

Binomial Experiments: A binomial experiment satisfies each of the following three criteria:

  • There are \(n\) repeated trials.
  • Each trial has two possible outcomes (usually called success and failure for convenience).
  • The trials are independent of one another. That is, for each trial, the probability of success is \(p\) (which remains constant).

Binomial Distribution: Let \(X\) be the number of successes resulting from a binomial experiment with \(n\) trials. We can compute the following probabilities:

  • The probability of exactly \(k\) successes is given by
    \(\displaystyle{\mathbb{P}\left[X = k\right] = \binom{n}{k}\cdot p^k\left(1 - p\right)^{n-k} \approx \tt{dbinom(k, n, p)}}\)
  • The probability of at most \(k\) successes is given by
    \(\displaystyle{\mathbb{P}\left[X \leq k\right] = \sum_{i=0}^{k}{\binom{n}{i}\cdot p^i\left(1 - p\right)^{n-i}} \approx \tt{pbinom(k, n, p)}}\)

In the equations above, \(\binom{n}{k} = \frac{n!}{k!\left(n-k\right)!}\) counts the number of ways to arrange the \(k\) successes amongst the \(n\) trials. That being said, the R functionality, dbinom() and pbinom() allow us to bypass the messy formulas – but you’ll still need to know what these functions do in order to use them correctly!

Tip: Binomial Distribution

We need to use the binomial distribution to find probabilities associated with numbers of successful (or failing) outcomes in which we do not know for certain the trials on which the successes (or failures) occur.

The code block below is set up to find the probability of exactly two flips of a coin landing heads-up out of seven total flips. Edit the code block so that it finds the probability that you got exactly two of the four questions on your chemistry quiz from earlier correct. As a reminder, there were five answer options for each question and you were guessing randomly.

Hint 1

The arguments to dbinom() are, in order:

  • number of successful outcomes
  • total number of trials
  • probability of a successful outcome on one trial
Hint 2

You are interested in two successful outcomes, so the first argument doesn’t need changing. The other two will need to be changed though.

Hint 3

For the second argument, how many questions are on the quiz?

dbinom(2, ___, ___)
Hint 4

For the third argument, what is the probability of guessing correctly on a single question?

dbinom(2, 4, ___)
Hint 5 (Solved)

Since each question has five possible choices, one of which is correct, the probability of guessing a correct response on one question is 0.2.

dbinom(2, 4, 0.2)
dbinom(2, 4, 0.2)

dbinom(2, 4, 0.2)

Good work. Now you’ll get to try a few more problems! As you work through the next set of questions, you may want to check out this example and solution. Note that in that document, I mention that drawing a simple picture for each problem will help you decide which function(s) you might use and whether you might need to make multiple computations. This is a really important strategy that will help you develop an approach to solve each problem.

Practice: For each of the following, consider a scenario in which a random sample of students is asked (in private) whether they’ve failed to hand in at least one assignment this semester. We assume that about % of students fail to hand in at least one assignment.

  1. Given a single, randomly chosen student, what is the probability that the student will have failed to hand in at least one assignment this semester?
Hint 1

If we had 100 randomly chosen students, how many might you expect failed to hand in at least one assignment? How did you know that? What, mathematically, did you do?

Hint 2

The percentage given in the problem statement directly tells you the probability for a single randomly selected student. Convert % to a decimal.

Hint 3 (Solved)

% as a decimal is .

p_assign

p_assign
  1. Find the probability that exactly 7 of the students have failed to hand in at least one assignment.
Hint 1

We don’t know exactly which of the students are failing to hand in at least one assignment here. We’ll need a special function to account for the different combinations of students.

Hint 2

Which function is more appropriate, dbinom() or pbinom()?

Hint 3

We’ll use dbinom() since we want the probability of exactly 7 students failing to hand in an assignment.

Hint 4

Fill in the blanks to calculate the desired probability.

dbinom(___, ___, ___)
Hint 5

The first argument is the number of successes (students not handing in an assignment) we are interested in — that’s 7 here.

dbinom(7, ___, ___)
Hint 6

The second argument is the total number of trials. There are students in the sample, so fill in the second blank with n_students.

dbinom(7, ___, ___)
Hint 7 (Solved)

The third argument is the probability of a single student failing to hand in at least one assignment, which is . So the full call has the second and third blanks filled with and , respectively.

dbinom(7, ___, ___)
dbinom(7, n_students, p_assign)

dbinom(7, n_students, p_assign)
  1. Find the probability that at most of the students have failed to hand in at least one assignment.
Hint 1

Similar to the previous question, we don’t know exactly which of the students are failing to hand in at least one assignment here. We’ll need a special function to account for the different combinations of students.

Hint 2

Which function is more appropriate here, dbinom() or pbinom()?

Hint 3

We’ll use pbinom() since we want the probability of at most a certain number of students failing to hand in an assignment.

Hint 4

Fill in the blanks to calculate the desired probability.

pbinom(___, ___, ___)
Hint 5

For pbinom(), the first argument is the maximum number of successes you are willing to consider — that’s thresh_atmost here. The second and third arguments are the number of trials and the probability of success.

pbinom(___, ___, ___)
Hint 6 (Solved)

Fill in for the maximum number of successes, for the number of trials, and for the probability of success.

pbinom(___, ___, ___)
pbinom(thresh_atmost, n_students, p_assign)

pbinom(thresh_atmost, n_students, p_assign)
  1. Find the probability that at least of the students have failed to hand in at least one assignment.
Hint 1

Again, we’ll need a special function because we don’t know which of the students will have failed to hand in an assignment.

Hint 2

Unfortunately, neither dbinom() nor pbinom() are a perfect fit for this scenario. Is either one better-suited to it, though?

Hint 3

The pbinom() function can handle multiple possible outcomes, while dbinom() is most useful when we are interested in exactly one outcome.

Hint 4

Is there a way that we could utilize the pbinom() function here?

Hint 5

The challenge is that pbinom() will find the probability of at most some number of successes, not the probability of at least some number of successes.

Hint 6

The approach below, filling the blanks with , , and , results in the wrong probability — why?

pbinom(___, ___, ___)
Hint 7

Filling the blanks with , , and calculates the probability of 0 through successes. That’s not what we want. We want or more.

pbinom(___, ___, ___)
Hint 8

Could we start with the probability of any (all) outcomes, and then just remove the probability of the events we don’t want?

Hint 9

Since we only have students, it must be the case that somewhere between 0 and students fail to hand in at least one assignment. This probability is 1 (or 100%).

Hint 10

Fill in the blanks to answer the question.

1 - pbinom(___, ___, ___)
Hint 11

Something is still wrong if we fill in the blanks with , , and . What needs to be adjusted?

1 - pbinom(___, ___, ___)
Hint 12

That call removes the probability of at most successes, leaving only the probability of more than . We want at least , which includes itself. Replace the first argument with thresh_atleast - 1.

1 - pbinom(___, ___, ___)
Hint 13 (Solved)

Use as the first argument, as the second, and as the third.

1 - pbinom(___, ___, ___)
1 - pbinom(thresh_atleast - 1, n_students, p_assign)

1 - pbinom(thresh_atleast - 1, n_students, p_assign)
  1. Find the probability that between a minimum of and a maximum of out of the students have failed to hand in at least one assignment.
Hint 1

Similar to the previous question, neither dbinom() nor pbinom() are perfect fits here. We should prefer pbinom() though, since we are interested in a collection of outcomes rather than exactly a single outcome.

Hint 2

The strategy we used in the previous question won’t exactly work either. Maybe we can use a similar idea, though!

Hint 3

Which of the following corresponds to a probability that is definitely too big — filling the first argument of pbinom() with thresh_between_lo or with thresh_between_hi?

Hint 4

Filling the first argument with results in a probability that is too large because it includes outcomes below our minimum of . Can we subtract out some of the probability to get only the outcomes we care about?

Hint 5

Fill in the blanks:

pbinom(___, ___, ___) - ___(___, ___, ___)

The first pbinom() call uses as its first argument, as its second, and as its third.

Hint 6

We want to remove a collection of outcomes (not just one), so use pbinom() for the subtracted term as well.

pbinom(___, ___, ___) - pbinom(___, ___, ___)
Hint 7

Filling the blanks in the first call to pbinom() with , , and is a good start. If we fill in the blanks of the second call to pbinom() with , , and , we obtain the wrong probability though. Why is that?

pbinom(___, ___, ___) - pbinom(___, ___, ___)
Hint 8

Filling in the blanks of the second call to pbinom() with , , and subtracts the probability of at most successes, which excludes itself from the answer. We want to include , so replace the first argument of that second call to pbinom() with .

pbinom(___, ___, ___) - pbinom(___, ___, ___)
Hint 9 (Solved)

Use and as the first arguments of the two pbinom() calls, with and as the remaining arguments in both.

pbinom(___, ___, ___) - pbinom(___, ___, ___)
pbinom(thresh_between_hi, n_students, p_assign) - pbinom(thresh_between_lo - 1, n_students, p_assign)

pbinom(thresh_between_hi, n_students, p_assign) - pbinom(thresh_between_lo - 1, n_students, p_assign)
Don’t Memorize Approaches

In several of the previous scenarios, we needed to think about the correct “first argument” being passed to pbinom(). Don’t try to memorize when to subtract one, when to add one, when to leave the number the same as it appeared in the problem, etc. The language is what matters, and there are lots of ways to express which outcomes we are most interested in. If you insist on memorizing, you’ll become frustrated quickly.

Instead of memorizing, take the time to draw a picture to help you. Examples of what these pictures might look like can be seen in the example and solution document, which I pointed you to earlier.

  1. The expected number of successes in a binomial experiment is sometimes denoted by \(\mathbb{E}\left[X\right]\) and can be computed as \(\mathbb{E}\left[X\right] = n\cdot p\), where \(n\) denotes the number of trials run and \(p\) denotes the probability of success on a single trial. Sometimes it is convenient to think of the expected number of successes as “the mean”. Use the code block below to compute the expected number of students (out of ) who have failed to hand in at least one assignment:
Hint 1

Use the formula \(\mathbb{E}\left[X\right] = n\cdot p\) from the problem statement to compute the expected value.

Hint 2

\(n\) is the number of trials. Here that’s students.

Hint 3

\(p\) is the probability of success on a single trial, which is in this scenario.

Hint 4 (Solved)

\(n \cdot p\) = *

___ * ___
n_students * p_assign

n_students * p_assign
  1. The standard deviation in the number of successes for a binomial experiment can also be computed. The quantity \(\displaystyle{s_X = \sqrt{n\cdot p\left(1 - p\right)}}\), where \(n\) denotes the number of trials run and \(p\) denotes the probability of success on a single trial, is the standard deviation in number of successes. Use the code block below to compute the standard deviation in number of students who have failed to hand in at least one assignment from random samples of students:
Hint 1

Use the formula from the statement of the question to compute the answer.

Hint 2

Recall that you can compute the square root in R using the sqrt() function.

Hint 3

For this problem, \(n\) = and \(p\) = .

Hint 4 (Solved)

Fill in the blanks with , , and again:

sqrt(___ * ___ * (1 - ___))
sqrt(n_students * p_assign * (1 - p_assign))

sqrt(n_students * p_assign * (1 - p_assign))

Be sure to write down what questions you had as you worked through these problems and to have a teacher, colleague, or tutor help clarify things for you.

Submit

If you are part of a course with an instructor who is grading your work on these activities, please copy and submit both of the hashes below using the method your instructor has requested.

Question Hash

The hash below encodes your responses to the multiple choice questions in this activity.

Exercise Hash

Click the button below to generate your exercise submission code. This hash encodes your work on the graded code exercises in this activity.

You must have attempted the graded exercises before clicking — clicking generates a snapshot of your current results. If you have completed the activity over multiple sessions, please go back through and hit the Run Code button on each graded exercise before generating the hash below, to ensure your most recent results are recorded.

Summary

Main Takeaways
  • The probability of an event \(A\) is a measure of its likelihood and is denoted \(\mathbb{P}[A]\). Every probability must be between 0 and 1.
  • If \(A\) and \(B\) are independent events, then \(\mathbb{P}[A \text{ and } B] = \mathbb{P}[A] \cdot \mathbb{P}[B]\).
  • A binomial experiment satisfies: (1) \(n\) repeated trials, (2) each trial has two possible outcomes, and (3) trials are independent with constant probability of success \(p\).
  • If \(X\) counts successes in a binomial experiment with \(n\) trials and success probability \(p\):
    • \(\mathbb{P}[X = k] \approx\) dbinom(k, n, p) — for exactly \(k\) successes
    • \(\mathbb{P}[X \leq k] \approx\) pbinom(k, n, p) — for at most \(k\) successes
    • Draw a picture to help you see how to use pbinom() and/or dbinom() to calculate probabilities. These two functions above are sufficient to handle any binomial probability scenario — the challenge is identifying how to combine them.
  • The expected number of successes is \(\mathbb{E}[X] = n \cdot p\).
  • The standard deviation of number of successes is \(s_X = \sqrt{n \cdot p \cdot (1 - p)}\).
Looking Ahead

The next activity introduces the normal distribution — a continuous probability distribution that underpins much of classical statistical inference. Our focus will be on learning to compute probabilities and percentiles from this important distribution.