Calculate row value based on (partial) column name matches

I start with 3 large data tables (called A1, A2, A3). Each table has 4 data columns (V1-V4), 1 โ€œDateโ€ column, which is constant in all three tables, and thousands of rows.

Here are some dummy data that come close to my tables.

A1.V1<-c(1,2,3,4) A1.V2<-c(2,4,6,8) A1.V3<-c(1,3,5,7) A1.V4<-c(1,2,3,4) A2.V1<-c(1,2,3,4) A2.V2<-c(2,4,6,8) A2.V3<-c(1,3,5,7) A2.V4<-c(1,2,3,4) A3.V1<-c(1,2,3,4) A3.V2<-c(2,4,6,8) A3.V3<-c(1,3,5,7) A3.V4<-c(1,2,3,4) Date<-c(2001,2002,2003,2004) DF<-data.frame(Date, A1.V1,A1.V2,A1.V3,A1.V4,A2.V1,A2.V2,A2.V3,A2.V4,A3.V1,A3.V2,A3.V3,A3.V4) 

So here is what my data frame looks like:

  Date A1.V1 A1.V2 A1.V3 A1.V4 A2.V1 A2.V2 A2.V3 A2.V4 A3.V1 A3.V2 A3.V3 A3.V4 1 2001 1 2 1 1 1 2 1 1 1 2 1 1 2 2002 2 4 3 2 2 4 3 2 2 4 3 2 3 2003 3 6 5 3 3 6 5 3 3 6 5 3 4 2004 4 8 7 4 4 8 7 4 4 8 7 4 

My goal is to calculate the average row value for each of the corresponding columns from each data table. So in this case, I would like to use row tools for all columns ending in V1, all columns ending in V2, all columns ending in V3, and all columns ending in V4.

The end result will look like this:

  V1 V2 V3 V4 2001 1 2 1 1 2002 2 4 3 2 2003 3 6 5 3 2004 4 8 7 4 

So my question is, how can I calculate the calculation of rows based on partial match in the column name?

thanks

+1
source share
4 answers

I'm sure this can be done more elegantly, but this is one of the features that seems to work.

 # declare the column names colnames = c("V1", "V2", "V3", "V4") # calculate the means means = lapply(colnames, function(name) { apply(DF[,grep(name, names(DF))], 1, mean) }) # build the result result = do.call(cbind, means) result = as.data.frame(t(result)) rownames(result) = DF$Date 

I also have to describe what I did.

At first I announced that the column names would partially match.

Then, using the grep to partially select the columns in your data frame (which matched a particular substring). The apply command evaluates the means and lapply does this for all columns that partially correspond to the substring.

Using do.call and cbind (as suggested by DWin), we combine the individual columns. Finally, we set the column names from the Date column of the original data frame.

The problem can be solved more efficiently and effectively, see DWin and Maiasaura solutions.

0
source
 colnames = c("V1", "V2", "V3", "V4") sapply(colnames, function(x) rowMeans(DF [, grep(x, names(DF))] ) ) rownames(res) <- DF$Date res V1 V2 V3 V4 2001 1 2 1 1 2002 2 4 3 2 2003 3 6 5 3 2004 4 8 7 4 

If you need to create names automatically:

 > unique(sapply(strsplit(names(DF)[-1], ".", fixed=TRUE), "[", 2) ) [1] "V1" "V2" "V3" "V4" 
+6
source
 library(plyr) ddply(DF, .(Date), function(x) { foo <- melt(x, id.vars = 1) foo$variable <- substr(foo$variable, 4, 6) return(dcast(foo, Date ~ variable, mean)) }) Date V1 V2 V3 V4 1 2001 1 2 1 1 2 2002 2 4 3 2 3 2003 3 6 5 3 4 2004 4 8 7 4 
+4
source

You can use grep with value = T to get the corresponding names and then create an eval call in component j in data.table

 library(data.table) # convert to a data.table DT <- data.table(DF) # the indices we wish to group .index <- paste0('V',1:3) # a list containing the names name_list <- mapply(grep, pattern = as.list(.index ), MoreArgs = list(x= names(DT),value=T ), SIMPLIFY=F) # create the expression .e <- parse(text=sprintf('list( %s)', paste(mapply(sprintf, .index, lapply(name_list, paste, collapse = ', '), MoreArgs = list(fmt = '%s = mean(c(%s), na.rm = T)')), collapse = ','))) DT[, eval(.e),by=Date] ## Date V1 V2 V3 ## 1: 2001 1 2 1 ## 2: 2002 2 4 3 ## 3: 2003 3 6 5 ## 4: 2004 4 8 7 # what .e looks like .e ## expression(list( V1 = mean(c(A1.V1, A2.V1, A3.V1), na.rm = T),V2 = mean(c(A1.V2, A2.V2, A3.V2), na.rm = T),V3 = mean(c(A1.V3, A2.V3, A3.V3), na.rm = T))) 
+2
source

Source: https://habr.com/ru/post/1258767/


All Articles