How to change a variable in the nth occurrence of a value in another variable?

There is a data table.

library(data.table) car <- data.table(no = 1:100, turn = sample(1:5,100,replace = TRUE), dis = sample(1:10,100,replace = TRUE)) 

I want to change "dis" to -1, in the nth occurrence of turn == 3 , say the third time that "turn" is 3 .

I can select the third line turn == 3 :

 car[turn == 3, .SD[3]] 

However, I cannot update the "dis" on this line:

 car[turn == 3, .SD[3]][, dis := -1] 

Associated Q & A: Conditionally replacing column values ​​with data.table .

+5
source share
2 answers

Some alternatives. Use rowid or cumsum to create a row counter within groups. Add the counter to your state in i .

I use a slightly smaller toy dataset to make tracking changes easier:

 d <- data.table(x = 1:3, y = 1:12) d[rowid(x) == 3 & x == 3, y := -1] # @mt1022 d[cumsum(x == 3) == 3 & (x == 3), y := -1] # @docendo discimus d[(ix <- x == 3) & cumsum(ix) == 3, y := -1] 

Although the OP did not mention speed as a problem, I was still curious to use different approaches to the larger vector. Not surprisingly, the @Frank method is the fastest, especially when the number of unique values ​​to search for increases:

frank << docendo < henrik < mt022

 microbenchmark(henrik = d[rowid(x) == 3 & x == 3, y := -1], mt1022 = d[cumsum(x == 3) == 3 & (x == 3), y := -1], docendo = d[(ix <- x == 3) & cumsum(ix) == 3, y := -1], frank = d[d[x == 3, which = TRUE][3], y := -1], unit = "relative") d <- data.table(x = sample(1:3, 1e6, replace = TRUE), y = 1:1e6) # Unit: relative # expr min lq mean median uq max neval cld # henrik 4.417303 4.369407 4.133514 4.319839 4.329658 1.260394 100 b # mt1022 5.461961 5.285562 5.174559 5.186404 5.239738 1.608712 100 c # docendo 3.572646 3.624369 3.788678 3.589705 3.576637 1.733272 100 b # frank 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 100 ad <- data.table(x = sample(1:30, 1e6, replace = TRUE), y = 1:1e6) # Unit: relative # expr min lq mean median uq max neval cld # henrik 22.64881 19.54375 18.81963 18.91335 19.78559 5.507692 100 bc # mt1022 24.58258 21.17535 19.84417 20.96256 22.76020 3.625263 100 c # docendo 19.40044 16.75912 16.23321 16.47953 18.06264 4.234100 100 b # frank 1.00000 1.00000 1.00000 1.00000 1.00000 1.000000 100 a d <- data.table(x = sample(1:300, 1e6, replace = TRUE), y = 1:1e6) # Unit: relative # expr min lq mean median uq max neval cld # henrik 31.81237 32.51122 28.79490 30.35766 28.63560 8.236282 100 b # mt1022 34.71984 35.45341 33.20405 33.57394 31.50914 21.556367 100 c # docendo 27.99046 28.15855 26.56954 26.60644 25.20044 7.847163 100 b # frank 1.00000 1.00000 1.00000 1.00000 1.00000 1.000000 100 a # Unit: milliseconds # expr min lq mean median uq max neval cld # henrik 60.655582 76.455531 83.061266 77.632036 78.57818 203.224042 100 c # mt1022 66.701182 84.133034 87.967300 84.937201 85.72464 201.167914 100 c # docendo 52.938545 67.214360 71.558130 68.003891 68.51897 184.178346 100 b # frank 1.977821 2.494039 2.629852 2.663577 2.76089 3.613905 100 a 
+6
source

Here's another way:

 car[car[turn == 3, which = TRUE][3], dis := -1 ] 

A comment. For such an operation, you may need to enable verbose so that you can see if any changes have been made. For example, if we are looking for 111th 3 ....

 car[car[turn == 3, which = TRUE][111], dis := -1, verbose = TRUE ] # Detected that j uses these columns: dis # Assigning to 0 row subset of 100 rows 

It shows that 0 lines have been edited.


See also Using .I to return line numbers using the data.table package by using which = TRUE .

+3
source

Source: https://habr.com/ru/post/1271288/


All Articles