Distribution with duplicate row identifiers

Question

Distribution with duplicate row identifiers

We had questions on this topic before here , but I'm still struggling with the spread of this. I would like everyone to statehave their own column of temperature values.

Here are dput()my details. I will call himdf

structure(list(date = c("2018-01-21", "2018-01-21", "2018-01-20", 
"2018-01-20", "2018-01-19", "2018-01-19", "2018-01-18", "2018-01-18", 
"2018-01-17", "2018-01-17", "2018-01-16", "2018-01-16", "2018-01-15", 
"2018-01-15", "2018-01-14", "2018-01-14", "2018-01-12", "2018-01-12", 
"2018-01-11", "2018-01-11", "2018-01-10", "2018-01-10", "2018-01-09", 
"2018-01-09", "2018-01-08", "2018-01-08", "2018-01-07", "2018-01-07", 
"2018-01-06", "2018-01-06", "2018-01-05", "2018-01-05", "2018-01-04", 
"2018-01-04", "2018-01-03", "2018-01-03", "2018-01-03", "2018-01-03", 
"2018-01-02", "2018-01-02"), tmin = c(24, 31, 31, 29, 44, 17, 
32, 7, 31, 7, 31, 6, 30, 13, 30, 1, 43, 20, 33, 52, 42, 29, 30, 
29, 26, 32, 33, -2, 29, 0, 23, 3, 19, 11, NA, -3, 22, -3, 24, 
-4), state = c("UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH", 
"UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH", "UT", 
"OH", "UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH", 
"UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH")), class = "data.frame", row.names = c(NA, 
-40L), .Names = c("date", "tmin", "state"))

The code that I run is

df %>% spread(state,tmin)

which I expected to give me the following format

date UT  OH
... ... ...

but i get an error

Error: Duplicate identifiers for strings (36, 38), (35, 37)

I tried a few different things. One thing I tried was grouped by date. I thought that strings of the same date cause problems for spread. I also tried creating a new line with add_rownames(), then using spread(state,tmin), but also could not solve the problem.

+4

r dplyr tidyr spread

Alex Jan 22 '18 at 14:17

source

1

jdobres · Accepted Answer · 2018-01-22T14:30:58+0000

, spread , . "" . 36 38 :

         date tmin state
36 2018-01-03   -3    OH
38 2018-01-03   -3    OH

tidyr , . , 35 37 , :

         date tmin state
35 2018-01-03   NA    UT
37 2018-01-03   22    UT

:

df %>% 
  filter(!is.na(tmin)) %>% # remove NA values
  unique %>% # remove duplicated rows
  spread(state, tmin)

         date OH UT
1  2018-01-02 -4 24
2  2018-01-03 -3 22
3  2018-01-04 11 19
4  2018-01-05  3 23
5  2018-01-06  0 29
...

Distribution with duplicate row identifiers

More articles: