How to identify sequences in R

Question

How to identify sequences in R

This question is an extension of R - sequential sequence identification

I have a data frame in which I need to save only those tests where in the ROI column I have a sequential sequence of _aCORRECT1 and _CORRECT1. No matter how many times _aCORRECT1 and _CORRECT1 occur, they can be repeated.

In the example below, I can save ntrial 78 and 201 because _aCORRECT1 follows _CORRECT1. However, I need to remove ntrial 10 and 400. In test 10, _aCORRECT1 does not follow _CORRECT1. In test 400, _CORRECT1 is not preceded by _aCORRECT1.

Many thanks!

subject ROI                 ntrial 
sbj05   ff                  78     
sbj05   as                  78     
sbj05   fgfsd               78     
sbj05   sgf                 78     
sbj05   jh                  78     
sbj05   sgsgsfg             78     
sbj05   fgsfg               78     
sbj05   sgf_aCORRECT1       78     
sbj05   dfs_CORRECT1        78     
sbj05   ffg                 78     
sbj05   sdfdsf              78     
sbj05   sl                  78     
sbj05   wgrt                78     
sbj05   qswefrd             201    
sbj05   ssdg                201    
sbj05   sdgfdsg             201    
sbj05   sgsgd               201    
sbj05   sgsdg               201    
sbj05   dd_aCORRECT1        201    
sbj05   dd_aCORRECT1        201    
sbj05   ffds_CORRECT1       201    
sbj05   ffds_CORRECT1       201    
sbj05   ffds_CORRECT1       201    
sbj05   hy                  201    
sbj05   gfg                 201    
sbj05   nbc                 201    
sbj05   cvbvn               10     
sbj05   kpj                 10     
sbj05   nbvnb               10     
sbj05   mnm                 10     
sbj05   dghsfh_aCORRECT1    10     
sbj05   gdh                 10   
sbj05   fgjj                10     
sbj05   gnjdg               10     
sbj05   gf                  10     
sbj05   qw                  400    
sbj05   vfs                 400    
sbj05   zx                  400    
sbj05   zvzv                400    
sbj05   zvzv_CORRECT1       400    
sbj05   zvzd_CORRECT1       400    
sbj05   zvv                 400    
sbj05   cv                  400    
sbj05   v                   400    
sbj05   mngy                400

+4

r sequence

dede Apr 20 '17 at 17:08

source share

2 answers

Andrew Gustar · Answer 1 · 2017-04-20T17:20:22+0000

dplyr, df1 - , , ntrial . aCORRECT _CORRECT ntrial. df2 - df, ntrials

df1 <- df %>% mutate(aCOR=grepl("aCORRECT",ROI),COR=grepl("_CORRECT",ROI)) %>%
              group_by(ntrial) %>% summarise(keep=any(aCOR & lead(COR)))

df2 <- df[df$ntrial %in% df1$ntrial[df1$keep],]


df1
# A tibble: 4 × 2
  ntrial  keep
   <int> <lgl>
1     10 FALSE
2     78  TRUE
3    201  TRUE
4    400 FALSE

df2
   subject           ROI ntrial
1    sbj05            ff     78
2    sbj05            as     78
3    sbj05         fgfsd     78
4    sbj05           sgf     78
5    sbj05            jh     78
6    sbj05       sgsgsfg     78
7    sbj05         fgsfg     78
8    sbj05 sgf_aCORRECT1     78
9    sbj05  dfs_CORRECT1     78
10   sbj05           ffg     78
11   sbj05        sdfdsf     78
12   sbj05            sl     78
13   sbj05          wgrt     78
14   sbj05       qswefrd    201
15   sbj05          ssdg    201
16   sbj05       sdgfdsg    201
17   sbj05         sgsgd    201
...

eipi10 · Answer 2 · 2017-04-20T18:07:09+0000

ROI, , ntrial, .

library(dplyr)
library(stringr)

df %>% group_by(subject, ntrial) %>%
  filter(grepl("_aCORR_CORR", paste(str_extract(ROI, "_a?CORR"), collapse="")))

   subject           ROI ntrial
1    sbj05            ff     78
2    sbj05            as     78
3    sbj05         fgfsd     78
4    sbj05           sgf     78
5    sbj05            jh     78
6    sbj05       sgsgsfg     78
7    sbj05         fgsfg     78
8    sbj05 sgf_aCORRECT1     78
9    sbj05  dfs_CORRECT1     78
10   sbj05           ffg     78
11   sbj05        sdfdsf     78
12   sbj05            sl     78
13   sbj05          wgrt     78
14   sbj05       qswefrd    201
15   sbj05          ssdg    201
16   sbj05       sdgfdsg    201
17   sbj05         sgsgd    201
18   sbj05         sgsdg    201
19   sbj05  dd_aCORRECT1    201
20   sbj05  dd_aCORRECT1    201
21   sbj05 ffds_CORRECT1    201
22   sbj05 ffds_CORRECT1    201
23   sbj05 ffds_CORRECT1    201
24   sbj05            hy    201
25   sbj05           gfg    201
26   sbj05           nbc    201

a data.table , R gsub str_extract:

library(data.table)

setDT(df)[, .SD[grepl("_aCORR_CORR", paste(gsub(".*(_a?CORR).*","\\1", ROI),collapse=""))], by=.(subject,ntrial)]

How to identify sequences in R

More articles: