Data Viz and ggplot() Intro

Dr. Gilbert

January 14, 2025

The Highlights

  • Sample Viz
  • Bad Viz
  • Suggested Viz Choices
  • Structure of a ggplot()
  • Suggested Viz Choices, Revisited
  • On Plot Labels
  • Organizing plots with {patchwork}

A Good Viz Tells a Clear Story

A Bad Viz Doesn’t

Suggested Viz Choices

  • Single Numerical Variable

    • Histogram, Boxplot, or Density
  • Single Categorical Variable

    • Bar Graph
  • Two Numerical Variables

    • Scatterplot or Heatmap
  • Two Categorical Variables

    • Bar Graph with Fill Color or Heatmap
  • One Numerical and One Categorical Variable

    • Side-by-Side boxplots, overlayed or faceted histograms/densities

Structure of a ggplot()

data %>% #the data to be plotted
  ggplot() + #initialize a plot
  geom_PLOT_TYPE(aes(...)) + #plot type and required mappings
  labs() #labels for plot
  • Note we can pipe (%>%) a data frame into a plot

    • We don’t need a data frame though
  • Once we use ggplot() we use + to add layers instead of piping

  • geom_*() layers require aesthetics to map variables to plot features

    • Different geoms have different required/permitted aesthetics
  • Can add multiple geoms to a single plot

  • Every plot should include labels

  • Find a ggplot cheatsheet here

Let’s Do This!

  1. Open RStudio

  2. Check the top-right corner, next to the translucent blue box icon to verify that you are working in your MAT434 project space that is managing your GitHub Repo.

    • If you see None there instead of your project name, open your project by navigating to File -> Open Project or by using the dropdown menu near the project box
  3. Open your Quarto Notebook on the MLB batted balls and home runs from last time

  4. Add a section on data visualizations

Suggested Viz Choices, Revisited

  • Single Numerical Variable

    • geom_boxplot(), geom_histogram(), or geom_density()

      • Require x or y aesthetic (but not both!)
      • For example, geom_density(aes(x = hwy))
  • Single Categorical Variable

    • geom_bar()

      • Requires x or y aesthetic (but not both!)
      • For example, geom_bar(aes(x = class))
    • geom_col()

      • Can have both x and y aesthetic
      • Example, geom_col(aes(x = class, y = n))

Suggested Viz Choices, Revisited

  • Two Numerical Variables

    • geom_point() or geom_hexbin()

      • Require both x and y aesthetic
      • For example, geom_point(aes(x = cty, y = hwy))
  • Two Categorical Variables

    • geom_bar()

      • Use x and fill aesthetics
      • For example, geom_bar(aes(x = class, fill = drv))

Suggested Viz Choices, Revisited

  • One Numerical and One Categorical Variable

    • geom_boxplot()

      • Use both x and y aesthetics
      • For example, geom_boxplot(aes(x = hwy, y = class))
    • geom_density() or geom_histogram()

      • Use only x aesthetic
      • Add layer facet_wrap(~ VAR_NAME)
  • Other available aesthetics include color, size, shape, and alpha (transparency)

    • Remember, specific geoms permit only specific aesthetics

Try It!

  • Start with a few single-variable plots

    • Plot the distribution of home-runs versus non-home-runs (is_home_run)
    • Plot the distribution of pitch_mph
    • Plot the distribution of launch_speed
    • Plot the distribution of launch_angle
  • Ask ChatGPT (or your favorite LLM) to help you troubleshoot your plots or to “trick out” at least one

Try It!

  • Now build some multivariable plots

    • you might focus on searching for features in your data set that seem associated with whether or not a batted ball was a home run.
    • if you are interested in potential other associations, investigate those as well
  • Again, ask your favorite LLM for help troubleshooting code or “tricking out” your plots.

On Labels

  • The labs() layer permits global plot labels and labels for any mapped aesthetic

    • title
    • subtitle
    • caption
    • alt (for alt-text)
    • x
    • y
    • color
    • fill
    • etc.

Organizing Plots with {patchwork}

  • Often you’ll want to arrange plots together, rather than printing them out one at a time
  • The {patchwork} package provides very easy and intuitive framework for doing this.
  1. Create each of your plots, but store them into variables p1, p2, …

  2. Use + to organize plots side-by-side, and / to organize plots over/under one another.

    • For example, (p1 + p2) / p3 will arrange plots p1 and p2 side-by-side, with plot p3 underneath them.

Try It!

  • Try storing some of your earlier plots and printing them out in a combined plot by using {patchwork}
  • Keep playing around – post your best/favorite plot (and the code) to the #how-do-i channel on Slack