How to group similar rows in R

Question

How to group similar rows in R

I have a table as follows:

Rptname Score Bebo23 8 Bebo22 9 Bebo19 10 Alt88 12 Alt67 11 Jimm 5 Jimm2 7

etc .. I would like to summarize in groups those lines that are similar. i.e

  Bebo 27 Alt 22 Jimm 12

The beginning of the string name always matches the same part of the group, but the number of similar characters can vary. I understand that I will have to define groups and probably use some kind of regular expression, but I'm not sure how to group and summarize on that basis. Thanks for your help in advance.

+6

r grouping

Sebastian zeki Jan 24 '15 at 7:48

source share

2 answers

With dplyr:

 library(dplyr) DF %>% group_by(Rptname = sub("\\d+$", "", Rptname)) %>% summarise(Score = sum(Score)) #Source: local data frame [3 x 2] # # Rptname Score #1 Alt 23 #2 Bebo 27 #3 Jimm 12

Update:

If you want to group the first three letters in "Rptname", you can use the following code in dplyr:

 DF %>% group_by(Rptname = substr(Rptname, 1, 3)) %>% summarise(Score = sum(Score)) #Source: local data frame [3 x 2] # # Rptname Score #1 Alt 23 #2 Beb 27 #3 Jim 12

+3

docendo discimus Jan 24 '15 at 7:56

source share

akrun · Accepted Answer · 2015-01-24T07:51:34+0000

You can delete numbers at the end with sub and do aggregate

 do.call(`data.frame`, aggregate(Score~cbind(Rptname=sub('\\d+$', '', Rptname)), df, sum)) # Rptname Score #1 Alt 23 #2 Bebo 27 #3 Jimm 12

Or use transform with aggregate (as suggested by @docendo discimus)

 aggregate(Score ~ Rptname, transform(df, Rptname = sub("\\d+$", "", Rptname)), sum)

Or option with data.table

 library(data.table) setDT(df)[, .(Score=sum(Score)), by=list(Rptname=sub('\\d+$','', Rptname))]

Or using rowsum (suggested by @alexis_laz

 with(df, rowsum(Score, sub('\\d+$', '', Rptname))) # [,1] #Alt 23 #Bebo 27 #Jimm 12

Update

If the grouping is based on the first three characters, you can use substr

 aggregate(Score~Rptname, transform(df, Rptname=substr(Rptname, 1,3)), sum) # Rptname Score #1 Alt 23 #2 Beb 27 #3 Jim 12

How to group similar rows in R

Update

More articles: