Data Viz and ggplot() Intro

Dr. Gilbert

September 13, 2024

The Highlights

  • Sample Viz
  • Bad Viz
  • Suggested Viz Choices
  • Structure of a ggplot()
  • Suggested Viz Choices, Revisited
  • On Plot Labels
  • Organizing plots with {patchwork}

A Good Viz Tells a Clear Story

A Bad Viz Doesn’t

Suggested Viz Choices

  • Single Numerical Variable

    • Histogram, Boxplot, or Density
  • Single Categorical Variable

    • Bar Graph
  • Two Numerical Variables

    • Scatterplot or Heatmap
  • Two Categorical Variables

    • Bar Graph with Fill Color or Heatmap
  • One Numerical and One Categorical Variable

    • Side-by-Side boxplots, overlayed or faceted histograms/densities

Structure of a ggplot()

data %>% #the data to be plotted
  ggplot() + #initialize a plot
  geom_PLOT_TYPE(aes(...)) + #plot type and required mappings
  labs() #labels for plot
  • Note we can pipe (%>%) a data frame into a plot

    • We don’t need a data frame though
  • Once we use ggplot() we use + to add layers instead of piping

  • geom_*() layers require aesthetics to map variables to plot features

    • Different geoms have different required/permitted aesthetics
  • Can add multiple geoms to a single plot

  • Every plot should include labels

  • Find a ggplot cheatsheet here

Let’s Do This!

  1. Open RStudio

  2. Check the top-right corner, next to the translucent blue box icon to verify that you are working in your MAT300 project space

    • If you see None there instead of your project name, open your project by navigating to File -> Open Project or by using the dropdown menu near the project box
  3. Open your Quarto Notebook from last time

  4. Add some questions that you think could be answered using a data visualization and describe the relevant viz type

Suggested Viz Choices, Revisited

  • Single Numerical Variable

    • geom_boxplot(), geom_histogram(), or geom_density()

      • Require x or y aesthetic (but not both!)
      • For example, geom_density(aes(x = hwy))
  • Single Categorical Variable

    • geom_bar()

      • Requires x or y aesthetic (but not both!)
      • For example, geom_bar(aes(x = class))
    • geom_col()

      • Can have both x and y aesthetic
      • Example, geom_col(aes(x = class, y = n))

Suggested Viz Choices, Revisited

  • Two Numerical Variables

    • geom_point() or geom_hexbin()

      • Require both x and y aesthetic
      • For example, geom_point(aes(x = cty, y = hwy))
  • Two Categorical Variables

    • geom_bar()

      • Use x and fill aesthetics
      • For example, geom_bar(aes(x = class, fill = drv))

Suggested Viz Choices, Revisited

  • One Numerical and One Categorical Variable

    • geom_boxplot()

      • Use both x and y aesthetics
      • For example, geom_boxplot(aes(x = hwy, y = class))
    • geom_density() or geom_histogram()

      • Use only x aesthetic
      • Add layer facet_wrap(~ VAR_NAME)
  • Other available aesthetics include color, size, shape, and alpha (transparency)

    • Remember, specific geoms permit only specific aesthetics

Try It!


\(\bigstar\) Now that you know about data visualization types, build basic plots to answer at least two of the questions you wrote out earlier

03:00

Try It!


\(\bigstar\) Now that you know about data visualization types, build basic plots to answer at least two of the questions you wrote out earlier

05:00

Try It!


\(\bigstar\) Now that you know about data visualization types, build basic plots to answer at least two of the questions you wrote out earlier

05:00

On Labels

  • The labs() layer permits global plot labels and labels for any mapped aesthetic

    • title
    • subtitle
    • caption
    • alt (for alt-text)
    • x
    • y
    • color
    • fill
    • etc.

Try It!


\(\bigstar\) Now that you know about labeling options in the labs() layer, update your plots with meaningful labels

05:00

Organizing Plots with {patchwork}

  • Often you’ll want to arrange plots together, rather than printing them out one at a time
  • The {patchwork} package provides very easy and intuitive framework for doing this.
  1. Create each of your plots, but store them into variables p1, p2, …

  2. Use + to organize plots side-by-side, and / to organize plots over/under one another.

    • For example, (p1 + p2) / p3 will arrange plots p1 and p2 side-by-side, with plot p3 underneath them.

Try It!


\(\bigstar\) Use {patchwork} to experiment with different arrangements of your plots

05:00

Additional Practice

  1. Ask at least four more questions that can be answered using data visualization

    • At least one univiariate question and at least one multivariate question
    • Add them to your notebook and describe why they are interesting questions
  2. Construct relevant visuals (including meaningful labels) to answer your questions

    • Experiment with plot arrangements using {patchwork} if you like
    • Render your notebook to see the results when {patchwork} is used versus when it is not
    • Decide what you like better in this case
  3. Provide interpretations of the plots you are seeing

Reminder: You have a fully complete notebook using the penguins data on the class webpage.

In that notebook, I split the available data into training and testing sets – we’ll talk about why later on.

Next Time…


A Workshop Day on Quarto and R