I manipulated my data and found that at some point in the process I did something wrong. When I investigated the problem, the problem boiled down to the following spread() behavior in the tidyr package.
Here is a demo. Say we have a data frame as shown below.
> d <- data.frame(factor1 = rep(LETTERS[1:3], each = 3), + factor2 = rep(paste0("level", c(1, 2, 10)), 3), + num = 1:9 + ) > d factor1 factor2 num 1 A level1 1 2 A level2 2 3 A level10 3 4 B level1 4 5 B level2 5 6 B level10 6 7 C level1 7 8 C level2 8 9 C level10 9
What I wanted to do was convert this long format data frame to a wide format. And I thought spread() is the way to go. The result, however, was not what I expected.
> spread(d, factor2, num) factor1 level1 level2 level10 1 A 1 3 2 2 B 4 6 5 3 C 7 9 8
If factor1 is "A" and factor2 is "level2", the value should be 2, but the resulting wide format says 3. Apparently, num is sorted alphabetically of order factor2 (level1> level10> level2) and placed in wide format . But when this is the case, factor2 labels keep the same order as in the original data frame (level1> level2> level10).
Can someone explain why this is happening (and / or where I can find relevant information)?
source share