Let me start with the following and fully working code from Introduction to tidytext @CRAN
library(janeaustenr)
library(dplyr)
library(stringr)
original_books <- austen_books() %>%
group_by(book) %>%
mutate(linenumber = row_number(),
chapter = cumsum(str_detect(text, regex("^chapter [\\divxlc]",
ignore_case = TRUE)))) %>%
ungroup()
original_books
library(tidytext)
tidy_books <- original_books %>%
unnest_tokens(word, text)
tidy_books
data("stop_words")
cleaned_books <- tidy_books %>%
anti_join(stop_words)
Everything is still. I have a piece with six Jane Austen novels when standard garbage words are removed.
unique(cleaned_books$book)
Which gets me: feeling and sensitivity, pride and prejudice, Mansfield Park, Emma, Northanger Abbey, persuasion.
So if I want to make a standard word in the word TF of all six, no problem. Similarly (color added):
library(wordcloud)
library(RColorBrewer)
dark2 <- brewer.pal(8, "Dark2")
cleaned_books %>%
count(word) %>%
with(wordcloud(word, n, color = dark2, max.words = 100))
It works great. But how can I then make commonality.cloud () with all six novels and compare.cloud () with the same?
cleaned_books, , . !
. .
, - .
</p>
set1 <- brewer.pal(8, "Set1")
library(reshape2)
cleaned_books %>%
group_by(book) %>%
count(word) %>%
acast(word ~ book, value.var = "n", fill = 0) %>%
comparison.cloud(color = dark2, title.size = 1, scale = c(3, 0.3), random.order = FALSE, max.words = 100)
cleaned_books %>%
group_by(book) %>%
count(word) %>%
acast(word ~ book, value.var = "n", fill = 0) %>%
commonality.cloud(color = set1, title.size = 1, scale = c(3, 0.3), random.order = FALSE, max.words = 100)
.