How to use the tidyr distribution function

Question

How to use the tidyr distribution function

How to change the following table: This table is called df_1

Type Name Answer n TypeA Apple Yes 5 TypeA Apple No 10 TypeA Apple DK 8 TypeA Apple NA 20 TypeA Orange Yes 6 TypeA Orange No 11 TypeA Orange DK 8 TypeA Orange NA 23

Change to:

 Type Name Yes No DK NA TypeA Apple 5 10 8 20 TypeA Orange 6 11 8 23

I used the following codes to get the first table.

 df_1 <- df %>% group_by(Type, Name, Answer) %>% tally()

Then I tried to use the spread command to go to the second table, but received the following error message: "Error: all columns must be named"

 df_2 <- spread(df_1, Answer)

+5

dplyr tidyr

ayk Jan 08 '16 at 19:47

source share

2 answers

Nicholas g reich · Answer 1 · 2016-01-14T23:12:45+0000

Following ayk comment, I will give an example. It seems to me that when you have a data_frame with a factor or symbol class column that has NA values, this cannot be propagated without deleting them or reclassifying the data. This is typical of data_frame (note the dplyr class with an underscore in the name), since this works in my example when you have NA values in data.frame. For example, a slightly modified version of the above example:

Here is the data frame

 library(dplyr) library(tidyr) df_1 <- data_frame(Type = c("TypeA", "TypeA", "TypeB", "TypeB"), Answer = c("Yes", "No", NA, "No"), n = 1:4) df_1

Which gives a data_frame that looks like this

 Source: local data frame [4 x 3] Type Answer n (chr) (chr) (int) 1 TypeA Yes 1 2 TypeA No 2 3 TypeB NA 3 4 TypeB No 4

Then, when we try to remove it, an error message appears:

 df_1 %>% spread(key=Answer, value=n) Error: All columns must be named

But if we remove NA, then it "works":

 df_1 %>% filter(!is.na(Answer)) %>% spread(key=Answer, value=n) Source: local data frame [2 x 3] Type No Yes (chr) (int) (int) 1 TypeA 2 1 2 TypeB 4 NA

However, removing NA may not give you the desired result: for example, you might want to include them in your table. You can change the data directly to change the NA to a more descriptive meaning. Alternatively, you can change your data to data.frame, and then it spreads just fine:

 as.data.frame(df_1) %>% spread(key=Answer, value=n) Type No Yes NA 1 TypeA 2 1 NA 2 TypeB 4 NA 3

wibeasley · Answer 2 · 2016-01-09T19:32:57+0000

I think only tidyr is required from df_1 to df_2 .

 library(magrittr) df_1 <- read.csv(text="Type,Name,Answer,n\nTypeA,Apple,Yes,5\nTypeA,Apple,No,10\nTypeA,Apple,DK,8\nTypeA,Apple,NA,20\nTypeA,Orange,Yes,6\nTypeA,Orange,No,11\nTypeA,Orange,DK,8\nTypeA,Orange,NA,23", stringsAsFactors=F) df_2 <- df_1 %>% tidyr::spread(key=Answer, value=n)

Output:

  Type Name DK No Yes NA 1 TypeA Apple 8 10 5 20 2 TypeA Orange 8 11 6 23

How to use the tidyr distribution function

More articles: