How to group similar rows in R

I have a table as follows:

Rptname Score Bebo23 8 Bebo22 9 Bebo19 10 Alt88 12 Alt67 11 Jimm 5 Jimm2 7 

etc .. I would like to summarize in groups those lines that are similar. i.e

  Bebo 27 Alt 22 Jimm 12 

The beginning of the string name always matches the same part of the group, but the number of similar characters can vary. I understand that I will have to define groups and probably use some kind of regular expression, but I'm not sure how to group and summarize on that basis. Thanks for your help in advance.

+6
source share
2 answers

You can delete numbers at the end with sub and do aggregate

 do.call(`data.frame`, aggregate(Score~cbind(Rptname=sub('\\d+$', '', Rptname)), df, sum)) # Rptname Score #1 Alt 23 #2 Bebo 27 #3 Jimm 12 

Or use transform with aggregate (as suggested by @docendo discimus)

 aggregate(Score ~ Rptname, transform(df, Rptname = sub("\\d+$", "", Rptname)), sum) 

Or option with data.table

 library(data.table) setDT(df)[, .(Score=sum(Score)), by=list(Rptname=sub('\\d+$','', Rptname))] 

Or using rowsum (suggested by @alexis_laz

 with(df, rowsum(Score, sub('\\d+$', '', Rptname))) # [,1] #Alt 23 #Bebo 27 #Jimm 12 

Update

If the grouping is based on the first three characters, you can use substr

 aggregate(Score~Rptname, transform(df, Rptname=substr(Rptname, 1,3)), sum) # Rptname Score #1 Alt 23 #2 Beb 27 #3 Jim 12 
+4
source

With dplyr:

 library(dplyr) DF %>% group_by(Rptname = sub("\\d+$", "", Rptname)) %>% summarise(Score = sum(Score)) #Source: local data frame [3 x 2] # # Rptname Score #1 Alt 23 #2 Bebo 27 #3 Jimm 12 

Update:

If you want to group the first three letters in "Rptname", you can use the following code in dplyr:

 DF %>% group_by(Rptname = substr(Rptname, 1, 3)) %>% summarise(Score = sum(Score)) #Source: local data frame [3 x 2] # # Rptname Score #1 Alt 23 #2 Beb 27 #3 Jim 12 
+3
source

Source: https://habr.com/ru/post/981479/


All Articles