How to expand a large data frame in R

I have a dataframe

df <- data.frame(
  id = c(1, 1, 1, 2, 2, 3, 3, 3, 3, 4), 
  date = c("1985-06-19", "1985-06-19", "1985-06-19", "1985-08-01", 
           "1985-08-01", "1990-06-19", "1990-06-19", "1990-06-19", 
           "1990-06-19", "2000-05-12"), 
  spp = c("a", "b", "c", "c", "d", "b", "c", "d", "a", "b"),
  y = rpois(10, 5))

   id       date spp y
1   1 1985-06-19   a 6
2   1 1985-06-19   b 3
3   1 1985-06-19   c 7
4   2 1985-08-01   c 7
5   2 1985-08-01   d 6
6   3 1990-06-19   b 5
7   3 1990-06-19   c 4
8   3 1990-06-19   d 4
9   3 1990-06-19   a 6
10  4 2000-05-12   b 6

I want to expand it so that there is any combination of id and spp and has y = 0for each combination that is not currently in the data frame. Currently, the data frame contains about 100,000 rows and 15 columns. When expanded, this will be about 300,000 columns (there are 17 unique values ​​in my actual dataset spp).

For each value, the idvalue is the datesame (for example, when id = 2, date always = 1985-08-01). In my real dataset, all columns except sppand ycan be specified id.

I want to get something like:

   id       date spp y
   1 1985-06-19   a 6
   1 1985-06-19   b 3
   1 1985-06-19   c 7
   1 1985-06-19   d 0*
   2 1985-08-01   a 0*
   2 1985-08-01   b 0*
   2 1985-08-01   c 7
   2 1985-08-01   d 6
   3 1990-06-19   b 5
   3 1990-06-19   c 4
   3 1990-06-19   d 4
   3 1990-06-19   a 6
   4 2000-05-12   a 0*
   4 2000-05-12   b 6
   4 2000-05-12   c 0*
   4 2000-05-12   d 0*
  • Specify Added Rows

, , , , ( ) , . , dplyr, data.table reshape, . , id, spp y, left_join() merge() ( ) id

+2
3

expand.grid ,

mergedData <- merge(
    expand.grid(id = unique(df$id), spp = unique(df$spp)),
    df, by = c("id", "spp"), all =T)

mergedData[is.na(mergedData$y), ]$y <- 0

mergedData$date <- rep(levels(df$date),
                       each = length(levels(df$spp)))

, , plyr , , data.table.

+5

, ,

x<-unique(df$id)
y<-unique(df$spp)
newdf<-data.frame(x=rep(x,each=length(y)),y=rep(y, length(x)))
merged<-merge(newdf, df, by.x=c(x,y), by.y=c("id","spp"), all=T)
+2

There tidyris a new feature in the development version completethat does this. Of course, completeuses expand.gridinternally.

# get new version of tidyr
devtools::install_github("hadley/tidyr")
# load package
require(tidyr)
# calculations
complete(df, c(id, date), spp, fill = list(y = 0))
##    id       date spp y
## 1   1 1985-06-19   a 5
## 2   1 1985-06-19   b 3
## 3   1 1985-06-19   c 5
## 4   1 1985-06-19   d 0
## 5   2 1985-08-01   a 0
## 6   2 1985-08-01   b 0
## 7   2 1985-08-01   c 4
## 8   2 1985-08-01   d 9
## 9   3 1990-06-19   a 8
## 10  3 1990-06-19   b 3
## 11  3 1990-06-19   c 5
## 12  3 1990-06-19   d 6
## 13  4 2000-05-12   a 0
## 14  4 2000-05-12   b 3
## 15  4 2000-05-12   c 0
## 16  4 2000-05-12   d 0
+2
source

Source: https://habr.com/ru/post/1589058/


All Articles