Row splitting and frequency table generation in R

Question

Row splitting and frequency table generation in R

I have a brand name column in an R data frame that looks something like this:

"ABC Industries" "ABC Enterprises" "123 and 456 Corporation" "XYZ Company"

And so on. I am trying to create frequency tables for each word that appears in this column, for example, for example:

 Industries 10 Corporation 31 Enterprise 40 ABC 30 XYZ 40

I'm relatively new to R , so I was wondering how to do this. Should I break lines and put every single word in a new column? Is there a way to split a verbose line into several lines in one word?

+4

string split r frequency

aesir Dec 30 '11 at 4:26

source share

3 answers

Here is another liner. It uses paste() to combine all the column entries into one long text string, which then splits and tabs:

 text <- c("ABC Industries", "ABC Enterprises", "123 and 456 Corporation", "XYZ Company") table(strsplit(paste(text, collapse=" "), " "))

+6

Josh o'brien Dec 30 '11 at 7:00

source share

You can use the tidytext and dplyr :

 set.seed(42) text <- c("ABC Industries", "ABC Enterprises", "123 and 456 Corporation", "XYZ Company") data <- data.frame(category = sample(text, 100, replace = TRUE), stringsAsFactors = FALSE) library(tidytext) library(dplyr) data %>% unnest_tokens(word, category) %>% group_by(word) %>% count() #> # A tibble: 9 x 2 #> # Groups: word [9] #> word n #> <chr> <int> #> 1 123 29 #> 2 456 29 #> 3 abc 45 #> 4 and 29 #> 5 company 26 #> 6 corporation 29 #> 7 enterprises 21 #> 8 industries 24 #> 9 xyz 26

0

Filipw Feb 02 '18 at 14:03

source share

Dirk eddelbuettel · Accepted Answer · 2011-12-30T04:38:35+0000

If you want it, you can do it in one layer:

 R> text <- c("ABC Industries", "ABC Enterprises", + "123 and 456 Corporation", "XYZ Company") R> table(do.call(c, lapply(text, function(x) unlist(strsplit(x, " "))))) 123 456 ABC and Company 1 1 2 1 1 Corporation Enterprises Industries XYZ 1 1 1 1 R>

Here I use strsplit() to break down each login element; this returns a list (in a list). I use do.call() to simply combine all the resulting lists into a single vector, which is summarized by table() .

Row splitting and frequency table generation in R

More articles: