Collect / melt wide data into different value columns

I have wide-format data that has two different sets of columns of values: those that contain mass (Mass1, Mass2, etc.) and those that contain the corresponding dates (Mass1_date, Mass2_date, etc.).

library(tidyr)
library(dplyr)
library(lubridate)

df <- structure(list(Year = 2004, Nest_no = 21, Mass1 = 2325, Mass1_date = structure(1081987200, class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), Mass2 = 2000, Mass2_date = structure(1082851200, class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), Mass3 = 1750, Mass3_date = structure(1083715200, class = c("POSIXct", 
"POSIXt"), tzone = "UTC")), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -1L), .Names = c("Year", "Nest_no", "Mass1", 
"Mass1_date", "Mass2", "Mass2_date", "Mass3", "Mass3_date"))

df

## Source: local data frame [1 x 8]
## 
##    Year Nest_no Mass1 Mass1_date Mass2 Mass2_date Mass3 Mass3_date
##   (dbl)   (dbl) (dbl)     (time) (dbl)     (time) (dbl)     (time)
## 1  2004      21  2325 2004-04-15  2000 2004-04-25  1750 2004-05-05

I would like to "sort" the data in a long format, where there are two sets of columns of gathered ( melted) values in two different columns of value, one column containing the values ​​of the "Mass" columns and one with the values ​​of the date columns:

## Source: local data frame [3 x 5]
## 
##    Year Nest_no capture       date weight
##   (dbl)   (dbl)   (dbl)     (date)  (dbl)
## 1  2004      21       1 2004-04-15   2325
## 2  2004      21       2 2004-04-25   2000
## 3  2004      21       3 2004-05-05   1750

At first I thought that I could use tidyrand do it in two steps.

gather(df, capture, date, contains("Date")) %>% 
  gather(capture2, weight, contains("Mass"))

## Source: local data frame [9 x 6]
## 
##    Year Nest_no    capture       date capture2 weight
##   (dbl)   (dbl)      (chr)     (time)    (chr)  (dbl)
## 1  2004      21 Mass1_date 2004-04-15    Mass1   2325
## 2  2004      21 Mass2_date 2004-04-25    Mass1   2325
## 3  2004      21 Mass3_date 2004-05-05    Mass1   2325
## 4  2004      21 Mass1_date 2004-04-15    Mass2   2000
## 5  2004      21 Mass2_date 2004-04-25    Mass2   2000
## 6  2004      21 Mass3_date 2004-05-05    Mass2   2000
## 7  2004      21 Mass1_date 2004-04-15    Mass3   1750
## 8  2004      21 Mass2_date 2004-04-25    Mass3   1750
## 9  2004      21 Mass3_date 2004-05-05    Mass3   1750

However, it did not work as expected. After several attempts, I came up with this solution:

df <- gather(df, capture2, weight, contains("Mass"), convert = T) %>% 
  mutate(capture = extract_numeric(capture2))

## Warning: attributes are not identical across measure variables; they will
## be dropped

df$capture2 <- ifelse(grepl("date", df$capture2), "date", "weight")

df <- spread(df, capture2, weight) %>% 
  mutate(date = as.Date(as.POSIXct(date, origin = "1970-01-01")))

df

## Source: local data frame [3 x 5]
## 
##    Year Nest_no capture       date weight
##   (dbl)   (dbl)   (dbl)     (date)  (dbl)
## 1  2004      21       1 2004-04-15   2325
## 2  2004      21       2 2004-04-25   2000
## 3  2004      21       3 2004-05-05   1750

I was wondering if there is a better way to achieve this?

Thanks Philip

+4
1

melt data.table. measure patterns "" "".

library(data.table)
melt(as.data.table(df), measure=patterns('\\d$', 'date$'),
         variable.name='capture', value.name= c('weight', 'date'))
#   Year Nest_no capture weight       date
#1: 2004      21       1   2325 2004-04-15
#2: 2004      21       2   2000 2004-04-25
#3: 2004      21       3   1750 2004-05-05
+4

Source: https://habr.com/ru/post/1627401/


All Articles