Here are a few reminders and some things to be on the look-out for in Chapter 6.
NEW PACKAGES: The following packages are new this week and will need to be installed prior to use: topicmodels
GREAT LIBRARY HEIST: I had some trouble with the Pride and Prejudice text. This book, from Project Gutenberg includes a table of contents, and leading whitespaces in the text column. Here’s how I got around the issue.
#Read in the thre other texts
<- c("Twenty Thousand Leagues under the Sea",
titles "The War of the Worlds",
"Great Expectations")
<- gutenberg_works(title %in% titles) %>%
books gutenberg_download(meta_fields = "title", mirror = "http://mirrors.xmission.com/gutenberg/")
#Read in Pride and Prejudice separately
<- gutenberg_download(gutenberg_id = 1342)
pride
#Add in the title column -- rep() stands for repeat, so we are just making
#a column of the appropriate length here.
$title <- rep("Pride and Prejudice", nrow(pride))
pride
#Cut out the rows corresponding to the table of contents.
<- pride[c(1:12, 139:nrow(pride)), ]
pride
#Remove the leading whitespace in the text column -- we'll *apply* the
#trimws function to every cell in the text column of our
#pride data frame. The result of lapply() is a list, so we will just
#unlist() the result to get back to a simple string value. (There are
#probably beter ways to do this).
$text <- unlist(lapply(pride$text, trimws))
pride
#Combine the original three books with the Pride and Prejudice text.
<- bind_rows(books, pride) books
That’s all for now. Post questions you have in Slack and/or bring them with you to our Thursday live meeting.