Ddply + sum function column name

I am trying to use ddply and summarise with the plyr package, but I have difficulty parsing by column names that keep changing ... In my example, I would like something that will parse in X1 programmatically, rather than hard coding in X1 to the ddply function.

setting example

 require(xts) require(plyr) require(reshape2) require(lubridate) t <- xts(matrix(rnorm(10000),ncol=10), Sys.Date()-1000:1) t.df <- data.frame(coredata(t)) t.df <- cbind(day=wday(index(t), label=TRUE, abbr=TRUE), t.df) t.df.l <- melt(t.df, id.vars=c("day",colnames(t.df)[2]), measure.vars=colnames(t.df)[3:ncol(t.df)]) 

This is the bit I'm afraid of ....

 cor.vars <- ddply(t.df.l, c("day","variable"), summarise, cor(X1, value)) 

I do not want to use the term X1 and would like to use something like

 cor.vars <- ddply(t.df.l, c("day","variable"), summarise, cor(colnames(t.df)[2], value)) 

but this causes an error: Error in cor(colnames(t.df)[2], value) : 'x' must be numeric

I also tried various other combos that parse vector values ​​for the argument x in cor ... but somehow none of them work ...

any ideas?

+4
source share
1 answer

Although this is probably not intended to be used by summarize , and there should be much more efficient approaches to your problem, the direct answer to your question is to use get :

 ddply(t.df.l, c("day","variable"), summarise, cor(get(colnames(t.df)[2]), value)) 

Edit: here is, for example, one approach that, in my opinion, is better suited to your problem:

 ddply(t.df.l, c("day", "variable"), function(x)cor(x["X1"], x["value"])) 

Above, "X1" can also be replaced with 2 or the name of the variable holding "X1" , etc. It depends on how you want to programmatically access the column.

+5
source

Source: https://habr.com/ru/post/1437768/


All Articles