I have a large dataset with spatio-temporal data. Each set of coordinates is associated with id (player identifier in a computer game). Unfortunately, the coordinates for each identifier are not recorded at any time. If the reading is not available for a specific identifier at time stamp x, then this row is completely excluded from the data set, and is not registered as NA.
I would like to have the same exact number of observations per unit time, as there are unique identifiers (ie, the insertion of “implied missing NA”). In units of time where identifiers are absent, they should be inserted as new lines with NA as their coordinates.
Here is a dummy dataset to illustrate:
time <- c(10,10,10,10,11,11,11,11,11,11,12,12,12,12,13,13,14,14,14,14,14,14,15,15,15)
id <- c(1,3,4,5,1,2,3,4,5,6,2,4,5,6,3,6,1,2,3,4,5,6,2,4,5)
x <- c(128,128,64,64,124,128,120,68,64,64,122,71,65,64,112,74,116,114,113,73,70,70,111,75,70)
y <- c(128,128,64,66,125,128,124,66,67,64,124,67,71,68,113,68,115,119,113,76,69,77,116,80,82)
spatiodf <- as.data.frame(cbind(time, id, x, y))
time id x y
1 10 1 128 128
2 10 3 128 128
3 10 4 64 64
4 10 5 64 66
5 11 1 124 125
6 11 2 128 128
7 11 3 120 124
8 11 4 68 66
9 11 5 64 67
10 11 6 64 64
11 12 1 118 123
12 12 2 122 124
13 12 4 71 67
14 12 5 65 71
15 12 6 64 68
16 13 3 112 113
17 13 6 74 68
18 14 1 116 115
19 14 2 114 119
20 14 3 113 113
21 14 4 73 76
22 14 5 70 69
23 14 6 70 77
24 15 2 111 116
25 15 4 75 80
26 15 5 70 82
From the above output, I would like to move on to the next conclusion, where a data frame was recreated with each unit of time having an equal number of observations (and NA values were manually inserted into rows that did not have values).
time <- rep(10:15, each = 6)
id <- rep(1:6, times = 6)
x <- c(128,NA,128,64,64,NA,124,128,120,68,64,64,NA,122,NA,71,65,64,NA,NA,112,NA,NA,74,116,114,113,73,70,70,NA,111,NA,75,70,NA)
y <- c(128,NA,128,64,66,NA,125,128,124,66,67,64,NA,124,NA,67,71,68,NA,NA,113,NA,NA,68,115,119,113,76,69,77,NA,116,NA,80,82,NA)
spatiodf_equal_obs <- as.data.frame(cbind(time, id, x, y))
library(dplyr)
spatiodf_equal_obs %>%
arrange(id)
time id x y
1 10 1 128 128
2 11 1 124 125
3 12 1 NA NA
4 13 1 NA NA
5 14 1 116 115
6 15 1 NA NA
7 10 2 NA NA
8 11 2 128 128
9 12 2 122 124
10 13 2 NA NA
11 14 2 114 119
12 15 2 111 116
13 10 3 128 128
14 11 3 120 124
15 12 3 NA NA
16 13 3 112 113
17 14 3 113 113
18 15 3 NA NA
19 10 4 64 64
20 11 4 68 66
21 12 4 71 67
22 13 4 NA NA
23 14 4 73 76
24 15 4 75 80
25 10 5 64 66
26 11 5 64 67
27 12 5 65 71
28 13 5 NA NA
29 14 5 70 69
30 15 5 70 82
31 10 6 NA NA
32 11 6 64 64
33 12 6 64 68
34 13 6 74 68
35 14 6 70 77
36 15 6 NA NA
, , , NA . , , fill() tidyr:
library(tidyr)
res <- spatiodf_equal_obs %>%
group_by(id) %>%
fill(x, y, .direction = "down") %>%
fill(x, y, .direction = "up")
, ( (df1, df2, all = TRUE)). , , .
:
time id x y
1 10 1 128 128
2 11 1 124 125
3 12 1 124 125
4 13 1 124 125
5 14 1 116 115
6 15 1 116 115
7 10 2 128 128
8 11 2 128 128
9 12 2 122 124
10 13 2 122 124
11 14 2 114 119
12 15 2 111 116
13 10 3 128 128
14 11 3 120 124
15 12 3 120 124
16 13 3 112 113
17 14 3 113 113
18 15 3 113 113
19 10 4 64 64
20 11 4 68 66
21 12 4 71 67
22 13 4 71 67
23 14 4 73 76
24 15 4 75 80
25 10 5 64 66
26 11 5 64 67
27 12 5 65 71
28 13 5 65 71
29 14 5 70 69
30 15 5 70 82
31 10 6 64 64
32 11 6 64 64
33 12 6 64 68
34 13 6 74 68
35 14 6 70 77
36 15 6 70 77