How to expand a large data frame in R

Question

How to expand a large data frame in R

I have a dataframe

df <- data.frame(
  id = c(1, 1, 1, 2, 2, 3, 3, 3, 3, 4), 
  date = c("1985-06-19", "1985-06-19", "1985-06-19", "1985-08-01", 
           "1985-08-01", "1990-06-19", "1990-06-19", "1990-06-19", 
           "1990-06-19", "2000-05-12"), 
  spp = c("a", "b", "c", "c", "d", "b", "c", "d", "a", "b"),
  y = rpois(10, 5))

   id       date spp y
1   1 1985-06-19   a 6
2   1 1985-06-19   b 3
3   1 1985-06-19   c 7
4   2 1985-08-01   c 7
5   2 1985-08-01   d 6
6   3 1990-06-19   b 5
7   3 1990-06-19   c 4
8   3 1990-06-19   d 4
9   3 1990-06-19   a 6
10  4 2000-05-12   b 6

I want to expand it so that there is any combination of id and spp and has y = 0for each combination that is not currently in the data frame. Currently, the data frame contains about 100,000 rows and 15 columns. When expanded, this will be about 300,000 columns (there are 17 unique values in my actual dataset spp).

For each value, the idvalue is the datesame (for example, when id = 2, date always = 1985-08-01). In my real dataset, all columns except sppand ycan be specified id.

I want to get something like:

   id       date spp y
   1 1985-06-19   a 6
   1 1985-06-19   b 3
   1 1985-06-19   c 7
   1 1985-06-19   d 0*
   2 1985-08-01   a 0*
   2 1985-08-01   b 0*
   2 1985-08-01   c 7
   2 1985-08-01   d 6
   3 1990-06-19   b 5
   3 1990-06-19   c 4
   3 1990-06-19   d 4
   3 1990-06-19   a 6
   4 2000-05-12   a 0*
   4 2000-05-12   b 6
   4 2000-05-12   c 0*
   4 2000-05-12   d 0*

Specify Added Rows

, , , , ( ) , . , dplyr, data.table reshape, . , id, spp y, left_join() merge() ( ) id

+2

r dplyr plyr reshape expand

djhocking 27 . '14 4:51

3

, ,

x<-unique(df$id)
y<-unique(df$spp)
newdf<-data.frame(x=rep(x,each=length(y)),y=rep(y, length(x)))
merged<-merge(newdf, df, by.x=c(x,y), by.y=c("id","spp"), all=T)

+2

Ananta 27 . '14 5:02

There tidyris a new feature in the development version completethat does this. Of course, completeuses expand.gridinternally.

# get new version of tidyr
devtools::install_github("hadley/tidyr")
# load package
require(tidyr)
# calculations
complete(df, c(id, date), spp, fill = list(y = 0))
##    id       date spp y
## 1   1 1985-06-19   a 5
## 2   1 1985-06-19   b 3
## 3   1 1985-06-19   c 5
## 4   1 1985-06-19   d 0
## 5   2 1985-08-01   a 0
## 6   2 1985-08-01   b 0
## 7   2 1985-08-01   c 4
## 8   2 1985-08-01   d 9
## 9   3 1990-06-19   a 8
## 10  3 1990-06-19   b 3
## 11  3 1990-06-19   c 5
## 12  3 1990-06-19   d 6
## 13  4 2000-05-12   a 0
## 14  4 2000-05-12   b 3
## 15  4 2000-05-12   c 0
## 16  4 2000-05-12   d 0

+2

shadow May 22, '15 at 15:51

source share

jenesaisquoi · Accepted Answer · 2014-02-27T05:21:09+0000

expand.grid ,

mergedData <- merge(
    expand.grid(id = unique(df$id), spp = unique(df$spp)),
    df, by = c("id", "spp"), all =T)

mergedData[is.na(mergedData$y), ]$y <- 0

mergedData$date <- rep(levels(df$date),
                       each = length(levels(df$spp)))

, , plyr , , data.table.

How to expand a large data frame in R

More articles: