Data Viz and ggplot() Intro

Dr. Gilbert

September 18, 2024

The Highlights

  • Sample Viz
  • Bad Viz
  • Suggested Viz Choices
  • Structure of a ggplot()
  • Suggested Viz Choices, Revisited
  • On Plot Labels
  • Organizing plots with {patchwork}

A Good Viz Tells a Clear Story

A Bad Viz Doesn’t

Suggested Viz Choices

  • Single Numerical Variable

    • Histogram, Boxplot, or Density
  • Single Categorical Variable

    • Bar Graph
  • Two Numerical Variables

    • Scatterplot or Heatmap
  • Two Categorical Variables

    • Bar Graph with Fill Color or Heatmap
  • One Numerical and One Categorical Variable

    • Side-by-Side boxplots, overlayed or faceted histograms/densities

Let’s Do This!

  1. Open your MAT241 workspace in Posit Cloud and the Day2to5_AustinHousingData.qmd file that we’ve been working within
  2. Navigate to the Visualizing the Austin Zillow Data subsection
  3. Add the following questions to this data visualization section
    • What is the average lot size?
    • Is Austin the only city in the dataset?
    • What is the distribution of price range?
    • Is there an association between number of bedrooms and price range?
  4. Identify the most appropriate visual type(s) to investigate the questions above
  5. Identify and add some additional single- and multiple-variable questions to answer with plots
03:00

Structure of a ggplot()

data %>% #the data to be plotted
  ggplot() + #initialize a plot
  geom_PLOT_TYPE(aes(...)) + #plot type and required mappings
  labs() #labels for plot
  • Note we can pipe (%>%) a data frame into a plot

    • We don’t need a data frame though
  • Once we use ggplot() we use + to add layers instead of piping

  • geom_*() layers require aesthetics to map variables to plot features

    • Different geoms have different required/permitted aesthetics
  • Can add multiple geoms to a single plot

  • Every plot should include labels

  • Find a ggplot cheatsheet here

Suggested Viz Choices, Revisited

  • Single Numerical Variable

    • geom_boxplot(), geom_histogram(), or geom_density()

      • Require x or y aesthetic (but not both!)
      • For example, geom_density(aes(x = hwy))
  • Single Categorical Variable

    • geom_bar()

      • Requires x or y aesthetic (but not both!)
      • For example, geom_bar(aes(x = class))
    • geom_col()

      • Can have both x and y aesthetic
      • Example, geom_col(aes(x = class, y = n))

Suggested Viz Choices, Revisited

  • Two Numerical Variables

    • geom_point() or geom_hexbin()

      • Require both x and y aesthetic
      • For example, geom_point(aes(x = cty, y = hwy))
  • Two Categorical Variables

    • geom_bar()

      • Use x and fill aesthetics
      • For example, geom_bar(aes(x = class, fill = drv))

Suggested Viz Choices, Revisited

  • One Numerical and One Categorical Variable

    • geom_boxplot()

      • Use both x and y aesthetics
      • For example, geom_boxplot(aes(x = hwy, y = class))
    • geom_density() or geom_histogram()

      • Use only x aesthetic
      • Add layer facet_wrap(~ VAR_NAME)
  • Other available aesthetics include color, size, shape, and alpha (transparency)

    • Remember, specific geoms permit only specific aesthetics

Try It!


\(\bigstar\) Build plot(s) to help answer give us insight into the distribution of the lot size variable. If you’re successful (or if you just want to try something else), move on to plots that show the distribution of city or price range, or even to try building a plot to examine a potential association between number of bedrooms and price range. Take five minutes and then we’ll debrief together.

05:00

Try It!


\(\bigstar\) Use your knowledge of data visualization and ggplot to build basic plots to answer at least two of the questions you wrote out earlier. If you write broken code, troubleshoot with a neighbor, pull me over, or even post to Slack for help. See what you can do in 7 minutes and then debrief together for an additional three.

Made a plot you are proud of? Post it and the code to the # r-questions channel in Slack!
07:00

Try It!


\(\bigstar\) Use your knowledge of data visualization and ggplot to build basic plots to answer at least two of the questions you wrote out earlier. If you write broken code, troubleshoot with a neighbor, pull me over, or even post to Slack for help. We’ll see what we can do in 7 minutes and then debrief together for an additional three.

Made a plot you are proud of? Post it and the code to the # r-questions channel in Slack!

03:00

On Labels

  • The labs() layer permits global plot labels and labels for any mapped aesthetic

    • title
    • subtitle
    • caption
    • alt (for alt-text)
    • x
    • y
    • color
    • fill
    • etc.

Try It!


\(\bigstar\) Now that you know about labeling options in the labs() layer, update your plots with meaningful labels

05:00

Organizing Plots with {patchwork}

  • Often you’ll want to arrange plots together, rather than printing them out one at a time
  • The {patchwork} package provides very easy and intuitive framework for doing this.
  1. Create each of your plots, but store them into variables p1, p2, …

  2. Use + to organize plots side-by-side, and / to organize plots over/under one another.

    • For example, (p1 + p2) / p3 will arrange plots p1 and p2 side-by-side, with plot p3 underneath them.

Try It!


\(\bigstar\) Use {patchwork} to experiment with different arrangements of your plots

05:00

Additional Practice

  1. Ask at several more questions that can be answered using data visualization

    • At least one univiariate question and at least one multivariate question
    • Add them to your notebook and describe why they are interesting questions
  2. Construct relevant visuals (including meaningful labels) to answer your questions

    • Experiment with plot arrangements using {patchwork} if you like
    • Render your notebook to see the results when {patchwork} is used versus when it is not
    • Decide what you like better in this case
  3. Provide interpretations of the plots you are seeing

Next Time…


A Transition Into Probability