Things to Know for Week 7 (Chapter 6)

GREAT LIBRARY HEIST: I had some trouble with the Pride and Prejudice text. This book, from Project Gutenberg includes a table of contents, and leading whitespaces in the text column. Here’s how I got around the issue.

#Read in the thre other texts
titles <- c("Twenty Thousand Leagues under the Sea",
        "The War of the Worlds",
        "Great Expectations")

books <- gutenberg_works(title %in% titles) %>%
  gutenberg_download(meta_fields = "title", mirror = "http://mirrors.xmission.com/gutenberg/")

#Read in Pride and Prejudice separately
pride <- gutenberg_download(gutenberg_id = 1342)

#Add in the title column -- rep() stands for repeat, so we are just making
#a column of the appropriate length here.
pride$title <- rep("Pride and Prejudice", nrow(pride))

#Cut out the rows corresponding to the table of contents.
pride <- pride[c(1:12, 139:nrow(pride)), ]

#Remove the leading whitespace in the text column -- we'll *apply* the
#trimws function to every cell in the text column of our
#pride data frame. The result of lapply() is a list, so we will just
#unlist() the result to get back to a simple string value. (There are
#probably beter ways to do this).
pride$text <- unlist(lapply(pride$text, trimws))

#Combine the original three books with the Pride and Prejudice text.
books <- bind_rows(books, pride)