Using a function in lapply in data.table in r

Question

Using a function in lapply in data.table in r

If there is a sample dataset as shown below.

> tmp <- data.table(x=c(1:10),y=(5:14)) > tmp xy 1: 1 5 2: 2 6 3: 3 7 4: 4 8 5: 5 9 6: 6 10 7: 7 11 8: 8 12 9: 9 13 10: 10 14

I want to select the two lowest numbers, and I want to change the value 0 to other numbers.

as

  xy 1: 1 5 2: 2 6 3: 0 0 4: 0 0 5: 0 0 6: 0 0 7: 0 0 8: 0 0 9: 0 0 10: 0 0

I think coding

 tmp[, c("x","y"):=lapply(.SD, x[which(!x %in% sort(x)[1:2])] = 0}), .SDcols=c("x","y")]

but it changes all 0

How can I solve this problem.

+5

r data.table lapply

Rokmc1050 Dec 12 '14 at 7:11

source share

2 answers

Arun · Answer 1 · 2014-12-12T10:35:11+0000

To expand my comment, I would do something like this:

 for (j in names(tmp)) { col = tmp[[j]] min_2 = sort.int(unique(col), partial=2L)[2L] # 2nd lowest value set(tmp, i = which(col > min_2), j = j, value = 0L) }

This is done across all columns in tmp and gets the second minimum value for each column using sort.int with the partial argument, which is slightly more efficient than using sort (since we don’t have to sort the entire data set to find the second minimum value).

Then we use set() to replace those rows where the column value is greater than the second minimum value for this column with a value of 0.

akrun · Answer 2 · 2014-12-12T07:24:14+0000

Maybe you can try

 tmp[, lapply(.SD, function(x) replace(x, !rank(x, ties.method='first') %in% 1:2, 0))] # xy #1: 1 5 #2: 2 6 #3: 0 0 #4: 0 0 #5: 0 0 #6: 0 0 #7: 0 0 #8: 0 0 #9: 0 0 #10:0 0

Using a function in lapply in data.table in r

More articles: