Using a function in lapply in data.table in r

If there is a sample dataset as shown below.

> tmp <- data.table(x=c(1:10),y=(5:14)) > tmp xy 1: 1 5 2: 2 6 3: 3 7 4: 4 8 5: 5 9 6: 6 10 7: 7 11 8: 8 12 9: 9 13 10: 10 14 

I want to select the two lowest numbers, and I want to change the value 0 to other numbers.

as

  xy 1: 1 5 2: 2 6 3: 0 0 4: 0 0 5: 0 0 6: 0 0 7: 0 0 8: 0 0 9: 0 0 10: 0 0 

I think coding

 tmp[, c("x","y"):=lapply(.SD, x[which(!x %in% sort(x)[1:2])] = 0}), .SDcols=c("x","y")] 

but it changes all 0

How can I solve this problem.

+5
source share
2 answers

To expand my comment, I would do something like this:

 for (j in names(tmp)) { col = tmp[[j]] min_2 = sort.int(unique(col), partial=2L)[2L] # 2nd lowest value set(tmp, i = which(col > min_2), j = j, value = 0L) } 

This is done across all columns in tmp and gets the second minimum value for each column using sort.int with the partial argument, which is slightly more efficient than using sort (since we don’t have to sort the entire data set to find the second minimum value).

Then we use set() to replace those rows where the column value is greater than the second minimum value for this column with a value of 0.

+4
source

Maybe you can try

 tmp[, lapply(.SD, function(x) replace(x, !rank(x, ties.method='first') %in% 1:2, 0))] # xy #1: 1 5 #2: 2 6 #3: 0 0 #4: 0 0 #5: 0 0 #6: 0 0 #7: 0 0 #8: 0 0 #9: 0 0 #10:0 0 
0
source

Source: https://habr.com/ru/post/1208864/


All Articles