Count the number of values ​​in the data table window.

I am trying to add new columns to data.table , where the values ​​in the rows depend on the relative relationship of the values ​​in the row. To be more precise, if there is an X value in a row, I would like to know how many other values ​​are in the same column (and group) that are inside the X-30.

That is, given this:

 DT<-data.table( X = c(1, 2, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1), Y = c(100, 101, 133, 134, 150, 156, 190, 200, 201, 230, 233, 234), Z = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)) 

I would like to get a new column with values:

 N <- c(0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 2) 

I tried the following, but I am not getting the results that I could use:

 DT[,list(Y,num=cumsum(Y[-.I]>DT[.I,Y]-30),Z),by=.(X)] 

Any ideas how to do this?

+5
source share
2 answers

This can probably be achieved with a sliding connection (?), But now it is an alternative to foverlaps

 DT[, `:=`(indx = .I, Y2 = Y - 30L, N = 0L)] # Add row index and a -30 interval setkey(DT, X, Y2, Y) # Sort by X and the intervals (for fovelaps) res <- foverlaps(DT, DT)[Y2 > i.Y2, .N, keyby = indx] # Run foverlaps and check what can we catch setorder(DT, indx) # go back to the original order DT[res$indx, N := res$N][, c("indx", "Y2") := NULL] # update results and remove cols DT # XYZN # 1: 1 100 1 0 # 2: 2 101 2 0 # 3: 2 133 3 0 # 4: 1 134 4 0 # 5: 1 150 5 1 # 6: 2 156 6 1 # 7: 1 190 7 0 # 8: 2 200 8 0 # 9: 2 201 9 1 # 10: 1 230 10 0 # 11: 1 233 11 1 # 12: 1 234 12 2 

Alternatively, use which=TRUE foverlaps to reduce overlay merging:

 # as above DT[, `:=`(indx = .I, Y2 = Y - 30L, N = 0L)] setkey(DT, X, Y2, Y) # using which=TRUE: res <- foverlaps(DT, DT, which=TRUE)[xid > yid, .N, by=xid] DT[res$xid, N := res$N] setorder(DT, indx) DT[, c("Y2","indx") := NULL] 
+6
source

Here's another way:

 DT[order(Y), N := 0:(.N-1) - findInterval(Y - 30, Y), by = X] all.equal(DT$N,N) # TRUE 
+4
source

Source: https://habr.com/ru/post/1245031/


All Articles