R ddply using if and ifelse functions

I am trying to apply a function to a dataframe using ddply from the plyr package, but I am getting some results that I do not understand. I have 3 questions about the results

Given:

mydf<- data.frame(c(12,34,9,3,22,55),c(1,2,1,1,2,2) , c(0,1,2,1,1,2)) colnames(mydf)[1] <- 'n' colnames(mydf)[2] <- 'x' colnames(mydf)[3] <- 'x1' 

mydf looks like this:

  nx x1 1 12 1 0 2 34 2 1 3 9 1 2 4 3 1 1 5 22 2 1 6 55 2 2 

Question number 1

If I do this:

 k <- function(x) { mydf$z <- ifelse(x == 1, 0, mydf$n) return (mydf) } mydf <- ddply(mydf, c("x") , .fun = k, .inform = TRUE) 

I get the following error:

 Error in `$<-.data.frame`(`*tmp*`, "z", value = structure(c(12, 34, 9, : replacement has 3 rows, data has 6 Error: with piece 1: nx x1 1 12 1 0 2 9 1 2 3 3 1 1 

I get this error regardless of whether I specify the variable to be divided by c ("x"), "x" or. (x). I do not understand why I am getting this error message.

Question number 2

But what I really want to do is set up the if / else function, because my dataset has variables x1, x2, x3 and x4, and I want to take these variables into account as well. But when I try something simple, for example:

 j <- function(x) { if(x == 1){ mydf$z <- 0 } else { mydf$z <- mydf$n } return(mydf) } mydf <- ddply(mydf, x, .fun = j, .inform = TRUE) 

I get:

 Warning messages: 1: In if (x == 1) { : the condition has length > 1 and only the first element will be used 2: In if (x == 1) { : the condition has length > 1 and only the first element will be used 

Question number 3

I am embarrassed to use function () and when to use function (x). Using the () function for j () or k () gives me another error:

 Error in .fun(piece, ...) : unused argument (piece) Error: with piece 1: nx x1 z 1 12 1 0 12 2 9 1 2 9 3 3 1 1 3 4 12 1 0 12 5 9 1 2 9 6 3 1 1 3 7 12 1 0 12 8 9 1 2 9 9 3 1 1 3 10 12 1 0 12 11 9 1 2 9 12 3 1 1 3 

where column z is invalid. But I see many functions written as function ().

I sincerely appreciate any comments that can help me with this.

+6
source share
1 answer

Much is explained here. Let's start with the simplest case. In your first example, all you need is:

 mydf$z <- with(mydf,ifelse(x == 1,0,n)) 

An equivalent ddply solution might look like this:

 ddply(mydf,.(x),transform,z = ifelse(x == 1,0,n)) 

Probably your biggest confusion is that you don't seem to understand what is being passed as arguments to functions in ddply .

Consider the first attempt:

 k <- function(x) { mydf$z <- ifelse(x == 1, 0, mydf$n) return (mydf) } 

The ddply way is that it splits mydf into several smaller frames of data based on the values ​​in column x . This means that every time ddply calls k , the argument passed to k is a data frame. In particular, a subset of your primary data frame.

So, inside k , x is a subset of mydf with all columns. You should not try to change mydf within k . Change x and then revert the modified version. (If you need, but the options that I displayed above are better.) Thus, we can rewrite your k as follows:

 k <- function(x) { x$z <- ifelse(x$x == 1, 0, x$n) return (x) } 

Note that you created some confusing things using x as the argument k and the name of one of our columns.

+11
source

Source: https://habr.com/ru/post/952784/


All Articles