Here are a few reminders and some things to be on the look-out for in Chapter 6.
NEW PACKAGES: The following packages are new this week and will need to be installed prior to use: topicmodels
GREAT LIBRARY HEIST: I had some trouble with the Pride and Prejudice text. This book, from Project Gutenberg includes a table of contents, and leading whitespaces in the text column. Here’s how I got around the issue.
#Read in the thre other texts
titles <- c("Twenty Thousand Leagues under the Sea",
"The War of the Worlds",
"Great Expectations")
books <- gutenberg_works(title %in% titles) %>%
gutenberg_download(meta_fields = "title", mirror = "http://mirrors.xmission.com/gutenberg/")
#Read in Pride and Prejudice separately
pride <- gutenberg_download(gutenberg_id = 1342)
#Add in the title column -- rep() stands for repeat, so we are just making
#a column of the appropriate length here.
pride$title <- rep("Pride and Prejudice", nrow(pride))
#Cut out the rows corresponding to the table of contents.
pride <- pride[c(1:12, 139:nrow(pride)), ]
#Remove the leading whitespace in the text column -- we'll *apply* the
#trimws function to every cell in the text column of our
#pride data frame. The result of lapply() is a list, so we will just
#unlist() the result to get back to a simple string value. (There are
#probably beter ways to do this).
pride$text <- unlist(lapply(pride$text, trimws))
#Combine the original three books with the Pride and Prejudice text.
books <- bind_rows(books, pride)That’s all for now. Post questions you have in Slack and/or bring them with you to our Thursday live meeting.