Another approach (and the IMO recommended approach) using dplyr is to first modify or convert your data into a neat data format before summing the values for each wave.
In detail, this process will include:
- Change your data to long format (
tidyr::gather ) - Determine which variables belong to each “wave”
- Sum the values for each wave
- Recover your data back to wide format (
tidyr::spread )
In your example, it will look like this:
library(tidyverse) mat <- matrix(runif(1000, 1, 10), ncol = 100) df <- data.frame(mat) dim(df) df %>% dplyr::mutate(id = dplyr::row_number()) %>% # reshape to "tidy data" or long format tidyr::gather(varname, value, -id) %>% # identify which variables belong to which "wave" dplyr::mutate(varnum = as.integer(stringr::str_extract(varname, pattern = '\\d+')), wave = floor((varnum-1)/10)+1) %>% # summarize your value for each wave dplyr::group_by(id, wave) %>% dplyr::summarise(avg = sum(value)/n()) %>% # reshape back to "wide" format tidyr::spread(wave, avg, sep='_') %>% dplyr::ungroup()
With the following conclusion:
# A tibble: 10 x 11 id wave_1 wave_2 wave_3 wave_4 wave_5 wave_6 wave_7 wave_8 wave_9 wave_10 <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 1 6.24 4.49 5.85 5.43 5.98 6.04 4.83 6.92 5.43 5.52 2 2 5.16 6.82 5.76 6.66 6.21 5.41 4.58 5.06 5.81 6.93 3 3 7.23 6.28 5.40 5.70 5.13 6.27 5.55 5.84 6.74 5.94 4 4 5.27 4.79 4.39 6.85 5.31 6.01 6.15 3.31 5.73 5.63 5 5 6.48 5.16 5.20 4.71 5.87 4.44 6.40 5.00 5.90 3.78 6 6 4.18 4.64 5.49 5.47 5.75 6.35 4.34 5.66 5.34 6.57 7 7 4.97 4.09 6.17 5.78 5.87 6.47 4.96 4.39 5.99 5.35 8 8 5.50 7.21 5.43 5.15 4.56 5.00 4.86 5.72 6.41 5.65 9 9 5.27 5.71 5.23 5.44 5.12 5.40 5.38 6.05 5.41 5.30 10 10 5.95 4.58 6.52 5.46 7.63 5.56 5.82 7.03 5.68 5.38
This can be attached to your source data to match your example (which used mutate ) as follows:
df %>% dplyr::mutate(id = dplyr::row_number()) %>% tidyr::gather(varname, value, -id) %>% dplyr::mutate(varnum = as.integer(stringr::str_extract(varname, pattern = '\\d+')), wave = floor((varnum-1)/10)+1) %>% dplyr::group_by(id, wave) %>% dplyr::summarise(avg = sum(value)/n()) %>% tidyr::spread(wave, avg, sep='_') %>% dplyr::ungroup() %>% dplyr::right_join(df %>%
One of the nice aspects of this approach is that you can check your data to make sure that you assign variables to the wave correctly.
df %>% dplyr::mutate(id = dplyr::row_number()) %>% tidyr::gather(varname, value, -id) %>% dplyr::mutate(varnum = as.integer(stringr::str_extract(varname, pattern = '\\d+')), wave = floor((varnum-1)/10)+1) %>% dplyr::distinct(varname, varnum, wave) %>% head()
which produces:
varname varnum wave 1 X1 1 1 2 X2 2 1 3 X3 3 1 4 X4 4 1 5 X5 5 1 6 X6 6 1