Character frequency in rows as columns in a data frame using R

Question

Character frequency in rows as columns in a data frame using R

I have a data frame of the initialfollowing format

> head(initial)
      Strings
1     A,A,B,C
2       A,B,C
3 A,A,A,A,A,B
4     A,A,B,C
5       A,B,C
6 A,A,A,A,A,B

and i need a data frame final

    > head(final)
      Strings A B C
1     A,A,B,C 2 1 1
2       A,B,C 1 1 1
3 A,A,A,A,A,B 5 1 0
4     A,A,B,C 2 1 1
5       A,B,C 1 1 1
6 A,A,A,A,A,B 5 1 0

to generate data frames, the following codes can be used to store a large number of rows

initial<-data.frame(Strings=rep(c("A,A,B,C","A,B,C","A,A,A,A,A,B"),100))
final<-data.frame(Strings=rep(c("A,A,B,C","A,B,C","A,A,A,A,A,B"),100),A=rep(c(2,1,5),100),B=rep(c(1,1,1),100),C=rep(c(1,1,0),100))

What is the fastest way to achieve this? Any help would be greatly appreciated.

+4

r

Rajarshi bhadra Oct 10 '15 at 14:56

source share

1 answer

akrun · Accepted Answer · 2015-10-10T14:58:49+0000

base R . "" (strsplit(...)), list , stack, data.frame /, table, 'data.frame' cbind .

 cbind(df1, as.data.frame.matrix(
                  table(
                    stack(
                     setNames(
                       strsplit(as.character(df1$Strings),','), 1:nrow(df1))
                           )[2:1])))
 #          Strings A B C D
 #1         A,B,C,D 1 1 1 1
 #2     A,B,B,D,D,D 1 2 0 3
 #3 A,A,A,A,B,C,D,D 4 1 1 2

mtabulate .

library(qdapTools)
cbind(df1, mtabulate(strsplit(as.character(df1$Strings), ',')))
#          Strings A B C D
#1         A,B,C,D 1 1 1 1
#2     A,B,B,D,D,D 1 2 0 3
#3 A,A,A,A,B,C,D,D 4 1 1 2

Update

"initial" . , factor levels, unique 'ind'.

df1 <- stack(setNames(strsplit(as.character(initial$Strings), ','),
          seq_len(nrow(initial))))
df1$ind <- factor(df1$ind, levels=unique(df1$ind))
cbind(initial, as.data.frame.matrix(table(df1[2:1])))

Character frequency in rows as columns in a data frame using R

Update

More articles: