I have a large file with 107635 rows and 3 columns: theme, areas of interest (ROI) and trial number. ROIs can be A, B, C, D, E, F. I want to do only those tests in which in the ROI column I have a sequential sequence B, C, D when B appears. No matter how many times B, C occur and D.
In the example below, I can save ntrial 78 and 201 because the first appearance of B was followed by C and D. However, I need to remove ntrial 10 and 400. In test 10, B, C and D are not sequential. In test 400, the first time B appears, B does not follow C and D.
For output, I need a column with a value of 1 for the tests in order to save in each row and a value of 0 for the rows corresponding to the deleted tests.
Any suggestion on how to create code that can automate a procedure without having to visually inspect each process?
Many thanks!
subject ROI ntrial output
sbj05 A 78 1
sbj05 A 78 1
sbj05 A 78 1
sbj05 A 78 1
sbj05 A 78 1
sbj05 A 78 1
sbj05 B 78 1
sbj05 B 78 1
sbj05 C 78 1
sbj05 D 78 1
sbj05 E 78 1
sbj05 E 78 1
sbj05 E 78 1
sbj05 A 201 1
sbj05 A 201 1
sbj05 A 201 1
sbj05 A 201 1
sbj05 A 201 1
sbj05 B 201 1
sbj05 C 201 1
sbj05 D 201 1
sbj05 E 201 1
sbj05 E 201 1
sbj05 E 201 1
sbj05 F 201 1
sbj05 F 201 1
sbj05 A 10 0
sbj05 A 10 0
sbj05 A 10 0
sbj05 A 10 0
sbj05 B 10 0
sbj05 A 10 0
sbj05 C 10 0
sbj05 D 10 0
sbj05 E 10 0
sbj05 E 10 0
sbj05 A 400 0
sbj05 A 400 0
sbj05 A 400 0
sbj05 B 400 0
sbj05 A 400 0
sbj05 B 400 0
sbj05 C 400 0
sbj05 C 400 0
sbj05 C 400 0
sbj05 D 400 0
sbj05 E 400 0
sbj05 E 400 0
sbj05 D 400 0
source
share