Check if steps are missing in the counter variable

I have a data file with one line for each participant (named 1-x, based on the study in which they participated). I want to check if all participants are present in the data set. This is my game data set, the person is the participants, the study is the study in which they participated.

df <- read.table(text = "personid study measurement
1         x     23
2         x     32
1         y     21
3         y     23
4         y     23
6         y     23", header=TRUE)

which is as follows:

  personid study measurement
1        1    x          23
2        2    x          32
3        1    y          21
4        3    y          23
5        4    y          23
6        6    y          23

so for y, I skip participants 2 and 5. How can I check this automatically? I tried to add a counter variable and compare this counter variable with the participant identifier, but as soon as one participant is missing, the comparison is pointless because alignment is disabled.

df %>% group_by(study) %>% mutate(id = 1:n(),check = id==personid)
Source: local data frame [6 x 5]
Groups: date [2]

  personid   study measurement    id check
     <int> <fctr>       <int> <int> <lgl>
1        1      x          23     1  TRUE
2        2      x          32     2  TRUE
3        1      y          21     1  TRUE
4        3      y          23     2 FALSE
5        4      y          23     3 FALSE
6        6      y          23     4 FALSE
+4
source share
3

, personid , , setdiff, ..

library(dplyr)

df %>% 
 group_by(study) %>% 
 mutate(new = toString(setdiff(max(personid):min(personid), personid)))

#Source: local data frame [6 x 4]
#Groups: study [2]

#  personid  study measurement   new
#     <int> <fctr>       <int> <chr>
#1        1      x          23      
#2        2      x          32      
#3        1      y          21  5, 2
#4        3      y          23  5, 2
#5        4      y          23  5, 2
#6        6      y          23  5, 2
+4

, tidy::expand() study personid, anti_join() , .

library(dplyr, warn.conflicts = FALSE)
library(tidyr)

df %>% 
  expand(study, personid) %>% 
  anti_join(df)
#> Joining, by = c("study", "personid")
#> # A tibble: 4 × 2
#>    study personid
#>   <fctr>    <int>
#> 1      y        2
#> 2      x        6
#> 3      x        4
#> 4      x        3
+3

A simple solution using the R base

tapply(df$personid, df$study, function(a) setdiff(min(a):max(a), a))

Conclusion:

$x
integer(0)

$y
[1] 2 5
+2
source

Source: https://habr.com/ru/post/1674043/


All Articles