How to split specific columns in data.frame in R using

I have a data.frame called mydata and vector identifiers containing column indices in data.frame that I would like to convert to factors. Now the following code solves the problem

for(i in ids) mydata[, i]<-as.factor(mydata[, i]) 

Now I wanted to clear this code using apply instead of an explicit for-loop.

 mydata[, ids]<-apply(mydata[, ids], 2, as.factor) 

However, the last statement gives me data.frame, where types are a symbol instead of factors. I do not see the difference between the two lines of code. Why don't they give the same result?

Regards, Michael

+6
source share
2 answers

The result of apply is a vector or an array or a list of values ​​(see ?apply ).

For your problem, you should use lapply instead:

 data(iris) iris[, 2:3] <- lapply(iris[, 2:3], as.factor) str(iris) 'data.frame': 150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : Factor w/ 23 levels "2","2.2","2.3",..: 15 10 12 11 16 19 14 14 9 11 ... $ Petal.Length: Factor w/ 43 levels "1","1.1","1.2",..: 5 5 4 6 5 8 5 6 5 6 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ... 

Note that this is one place where lapply will be much faster than the for loop. In general, the loop and lapply will have similar characteristics, but the <-.data.frame is very slow. Using lapply , you avoid the <- operation at each iteration and replace it with one purpose. It is much faster.

+10
source

This is because apply () works in a completely different way. First, it will execute the as.factor function in the local environment, collect the results from this and then try to combine them into an array , not in the dataframe. This array is a matrix in your case. R meets various factors and has no other way to bind them than to first transform them into character. This character matrix is ​​used to populate your data frame.

You can use lapply for this (see Andrie's answer) or colwise from the plyr function.

 require(plyr) Df[,ids] <- colwise(as.factor)(Df[,ids]) 
+4
source

Source: https://habr.com/ru/post/900602/


All Articles