How can I group by variable and list in random order in data.table?

Question

How can I group by variable and list in random order in data.table?

I have a variable that I want to group. It's simple. However, I want the resulting table to display its rows in random order. What I really want to do is a little harder. But let me show you a simplified version.

mydf = data.table( x = rep(1:4, each = 5), y = rep(c('A', 'B','c','D', 'E'), times = 2), v = rpois(20, 30) ) mydf[,list(sum(x),sum(v)), by=y] mydf[,list(sum(x),sum(v)), by=list(y=sample(y))] #to list all the raw data in order of y mydf[,list(x,v), by=y] mydf[,list(x,v), by=list(y=sample(y))]

If you look at the output results, you will notice that y is indeed in random order, but it got lost from the data that was in the lines with it.

What can I do?

+4

r sample permutation data.table

Farrel Jun 19 '13 at 16:51

source share

2 answers

I think this is what you are looking for ...?

 mydf[,.SD[sample(.N)],by=y]

Inspired by @BlueMagister's second solution, here's a randomized first way:

 mydf[sample(nrow(mydf)),.SD,by=y]

Here, use keyby instead of by if you want the groups to be displayed in alphabetical order.

+2

Frank Jun 19 '13 at 17:29

source share

Blue magister · Accepted Answer · 2013-06-19T17:00:48+0000

I would do the operation, and then sorted randomly:

 mydf[,list(x,v),by=y][sample(seq_len(nrow(mydf)),replace=FALSE)]

EDIT: random reordering, after grouping:

 mydf[,list(sum(x),sum(v)), by=y][sample(seq_len(length(y)),replace=FALSE)]

You can do something similar for grouping and random order before grouping, and it looks like it keeps the reordered order:

 mydf[order(setNames(sample(unique(y)),unique(y))[y])] mydf[order(setNames(sample(unique(y)),unique(y))[y]),list(sum(x),sum(v)),by=y] #perhaps more readable: mydf[{z <- unique(y); order(setNames(sample(z),z)[y])}] mydf[{z <- unique(y); order(setNames(sample(z),z)[y])},list(sum(x),sum(v)),by=y]

This is more transparent by adding a column first before the order.

 mydf[,new.y := setNames(sample(unique(y)),unique(y))[y]][order(new.y)]

Destruction:

 ##a random ordering of the elements of y ##(set.seed is used here to get consistent results) set.seed(1); mydf[,{z <- unique(y);sample(z)}] # [1] "B" "E" "D" "c" "A" ##assigning names to the elements of y ##creating a 1-1 bijective function between the elements of y set.seed(1); mydf[,{z <- unique(y);setNames(sample(z),z)}] # AB c DE #"B" "E" "D" "c" "A" ##subsetting by y puts y through the map ##in effect every element of y is posing as an element of y, picked at random ##notice that the names (top row) are the original y ##the values (bottom row) are the mapped-to values # AB c DEAB c DEAB c DEAB c DE #"B" "E" "D" "c" "A" "B" "E" "D" "c" "A" "B" "E" "D" "c" "A" "B" "E" "D" "c" "A" ##ordering by this now orders by the mapped-to values set.seed(1); mydf[{z <- unique(y);order(setNames(sample(z),z)[y])}]

EDIT: Include Arun’s sentence in the commentary use setattr to set names:

 mydf[{z <- unique(y); order(setattr(sample(z),'names',z)[y])}] mydf[{z <- unique(y); order(setattr(sample(z),'names',z)[y])},list(sum(x),sum(v)),by=y]

How can I group by variable and list in random order in data.table?

More articles: