I am trying to apply a function to a dataframe using ddply from the plyr package, but I am getting some results that I do not understand. I have 3 questions about the results
Given:
mydf<- data.frame(c(12,34,9,3,22,55),c(1,2,1,1,2,2) , c(0,1,2,1,1,2)) colnames(mydf)[1] <- 'n' colnames(mydf)[2] <- 'x' colnames(mydf)[3] <- 'x1'
mydf looks like this:
nx x1 1 12 1 0 2 34 2 1 3 9 1 2 4 3 1 1 5 22 2 1 6 55 2 2
Question number 1
If I do this:
k <- function(x) { mydf$z <- ifelse(x == 1, 0, mydf$n) return (mydf) } mydf <- ddply(mydf, c("x") , .fun = k, .inform = TRUE)
I get the following error:
Error in `$<-.data.frame`(`*tmp*`, "z", value = structure(c(12, 34, 9, : replacement has 3 rows, data has 6 Error: with piece 1: nx x1 1 12 1 0 2 9 1 2 3 3 1 1
I get this error regardless of whether I specify the variable to be divided by c ("x"), "x" or. (x). I do not understand why I am getting this error message.
Question number 2
But what I really want to do is set up the if / else function, because my dataset has variables x1, x2, x3 and x4, and I want to take these variables into account as well. But when I try something simple, for example:
j <- function(x) { if(x == 1){ mydf$z <- 0 } else { mydf$z <- mydf$n } return(mydf) } mydf <- ddply(mydf, x, .fun = j, .inform = TRUE)
I get:
Warning messages: 1: In if (x == 1) { : the condition has length > 1 and only the first element will be used 2: In if (x == 1) { : the condition has length > 1 and only the first element will be used
Question number 3
I am embarrassed to use function () and when to use function (x). Using the () function for j () or k () gives me another error:
Error in .fun(piece, ...) : unused argument (piece) Error: with piece 1: nx x1 z 1 12 1 0 12 2 9 1 2 9 3 3 1 1 3 4 12 1 0 12 5 9 1 2 9 6 3 1 1 3 7 12 1 0 12 8 9 1 2 9 9 3 1 1 3 10 12 1 0 12 11 9 1 2 9 12 3 1 1 3
where column z is invalid. But I see many functions written as function ().
I sincerely appreciate any comments that can help me with this.