Sequential generation of columns based on several existing columns

I have a data frame that looks like this:

 df <- data.frame(project = c("A", "B"),
                  no_dwellings = c(150, 180),
                  first_occupancy = c(2020, 2019))

  project no_dwellings first_occupancy
1       A          150            2020
2       B          180            2019

project- this is a column that defines residential buildings, no_dwellingsmeans how many houses are ultimately being built in these areas, and first_occupancy- this is an estimate of when the first residents will begin to move to newly built apartments.

I need to include this information in the demographic forecast. The best estimate is that every year (starting from first occupancy) 60 dwellings are turned over. Thus, I need to consistently generate columns that combine information from first_occupancyand no_dwellingsto indicate each year how much housing is likely to be moved. Since the number of cores constructed does not necessarily divide by 60, the remainder must be placed in the last column for the corresponding project.

This is what I expect my data frame to look like for further processing:

  project no_dwellings first_occupancy year_2019 year_2020 year_2021 year_2022
1       A          150            2020         0        60        60        30
2       B          180            2019        60        60        60         0
+4
source share
3 answers

Using the data.table-package, you can approach it as follows:

library(data.table)

setDT(df)[, .(yr = first_occupancy:(first_occupancy + no_dwellings %/% 60),
              dw = c(rep(60, no_dwellings %/% 60), no_dwellings %% 60))
          , by = .(project, no_dwellings, first_occupancy)
          ][, dcast(.SD, project + no_dwellings + first_occupancy ~ paste0('year_',yr), value.var = 'dw', fill = 0)]

which gives:

   project no_dwellings first_occupancy year_2019 year_2020 year_2021 year_2022
1:       A          150            2020         0        60        60        30
2:       B          180            2019        60        60        60         0

The same logic applied to tidyverse:

library(dplyr)
library(tidyr)

df %>% 
  group_by(project) %>% 
  do(data.frame(no_dwellings = .$no_dwellings, first_occupancy = .$first_occupancy,
                yr = paste0('year_',.$first_occupancy:(.$first_occupancy + .$no_dwellings %/% 60)),
                dw = c(rep(60, .$no_dwellings %/% 60), .$no_dwellings %% 60))) %>% 
  spread(yr, dw, fill = 0)
+5
source

, , make_pop_df. , , mutate, ', , unnest, , tidyr::spread .

library(tidyverse)

make_pop_df <- function(no_dwellings, first_year, decay = -60) {
    seq(from = no_dwellings, to = 0, by = decay) %>%
    tibble(pop  = ., year = first_year + 1:length(.) - 1
    )
}

df %>%
    group_by(project) %>% 
    mutate(pop_df = list(make_pop_df(no_dwellings, first_occupancy))) %>% 
    unnest(pop_df) %>%
    spread(key = year, value = pop)
+3

Another tidyverse using completeto create all the years and then fill in the numbers.

library(dplyr)
library(tidyr)

df2 <- df %>%
  mutate(year = first_occupancy) %>%
  group_by(project) %>%
  complete(nesting(no_dwellings, first_occupancy), 
         year = full_seq(c(year, min(year) + unique(no_dwellings) %/% 60), period = 1)) %>%
  mutate(number = c(rep(60, unique(no_dwellings) %/% 60), unique(no_dwellings) %% 60),
         year = paste("year", year, sep = "_")) %>%
  spread(year, number, fill = 0) %>%
  ungroup()
df2
# # A tibble: 2 x 7
#   project no_dwellings first_occupancy year_2019 year_2020 year_2021 year_2022
#   <fct>          <dbl>           <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
# 1 A               150.           2020.        0.       60.       60.       30.
# 2 B               180.           2019.       60.       60.       60.        0.
+2
source

Source: https://habr.com/ru/post/1694459/


All Articles