How a random subset of data with dplyr?

I have a problem with the dplyr sample_n function. Im trying to randomly extract subsets from data.frame, but it failed. Because sample_n only sample_n random strings.

Here are some examples showing how to extract random strings from each subset.

sample-rows-of-subgroups-from-dataframe-with-dplyr

selecting-n-random-rows-across-all-levels-of-a-factor-within-a-dataframe

This is not what I want. I want to randomly extract groups from a data frame and not random rows from each subset.

For instance,

  xx <- rep(rep(seq(0,800,200),each=10),times=2) yy<-c(replicate(2,sort(10^runif(10,-1,0),decreasing=TRUE)),replicate(2,sort(10^runif(10,-1,0),decreasing=TRUE)), replicate(2,sort(10^runif(10,-2,0),decreasing=TRUE)),replicate(2,sort(10^runif(10,-3,0),decreasing=TRUE)), replicate(2,sort(10^runif(10,-4,0), decreasing=TRUE))) V <- rep(seq(100,2500,length.out=10),times=2) No <- rep(1:10,each=10) df <- data.frame(V,xx,yy,No) library(dplyr) random <- df %>% group_by(No)%>% sample_n(5,replace=T) ## This part is the problem. 

For example, how to randomly retrieve 3 subsets with all their rows?

  V xx yy No 1 100.0000 0 0.9877468589 1 2 366.6667 0 0.6658268649 1 3 633.3333 0 0.4408336374 1 4 900.0000 0 0.4136939054 1 5 1166.6667 0 0.4104986026 1 6 1433.3333 0 0.3899468530 1 7 1700.0000 0 0.3042157845 1 8 1966.6667 0 0.1585948347 1 9 2233.3333 0 0.1307305044 1 10 2500.0000 0 0.1079459480 1 11 100.0000 200 0.7437972385 2 12 366.6667 200 0.7130753133 2 13 633.3333 200 0.6000577122 2 14 900.0000 200 0.5038569759 2 15 1166.6667 200 0.3740146819 2 16 1433.3333 200 0.3605675251 2 17 1700.0000 200 0.1821736571 2 18 1966.6667 200 0.1542015388 2 19 2233.3333 200 0.1453810015 2 20 2500.0000 200 0.1142553452 2 21 100.0000 400 0.9712414163 3 22 366.6667 400 0.5420861908 3 23 633.3333 400 0.4622129942 3 24 900.0000 400 0.3634606046 3 25 1166.6667 400 0.3541710297 3 26 1433.3333 400 0.3451167353 3 27 1700.0000 400 0.2413016960 3 28 1966.6667 400 0.2356020402 3 29 2233.3333 400 0.2054358298 3 30 2500.0000 400 0.1132074106 3 31 100.0000 600 0.9220690387 4 32 366.6667 600 0.8772938566 4 33 633.3333 600 0.7560569362 4 34 900.0000 600 0.5395093190 4 35 1166.6667 600 0.3696490756 4 36 1433.3333 600 0.1585255169 4 37 1700.0000 600 0.1425756544 4 38 1966.6667 600 0.1135199782 4 39 2233.3333 600 0.1061660399 4 40 2500.0000 600 0.1052644706 4 41 100.0000 800 0.6175240054 5 42 366.6667 800 0.5527556076 5 43 633.3333 800 0.4339775258 5 44 900.0000 800 0.2462104866 5 45 1166.6667 800 0.1955550477 5 46 1433.3333 800 0.1701907232 5 47 1700.0000 800 0.0824833313 5 48 1966.6667 800 0.0483463760 5 49 2233.3333 800 0.0246629341 5 50 2500.0000 800 0.0186177562 5 51 100.0000 0 0.8977179587 6 52 366.6667 0 0.8087930175 6 53 633.3333 0 0.5547978713 6 54 900.0000 0 0.4395436341 6 55 1166.6667 0 0.2972449261 6 56 1433.3333 0 0.0925262903 6 57 1700.0000 0 0.0665688788 6 58 1966.6667 0 0.0309263319 6 59 2233.3333 0 0.0238500731 6 60 2500.0000 0 0.0213679919 6 61 100.0000 200 0.7777420232 7 62 366.6667 200 0.2299083233 7 63 633.3333 200 0.0611370244 7 64 900.0000 200 0.0228982941 7 65 1166.6667 200 0.0150085546 7 66 1433.3333 200 0.0076922035 7 67 1700.0000 200 0.0066120335 7 68 1966.6667 200 0.0062052827 7 69 2233.3333 200 0.0037895910 7 70 2500.0000 200 0.0011051211 7 71 100.0000 400 0.3829786486 8 72 366.6667 400 0.1901274442 8 73 633.3333 400 0.1775864007 8 74 900.0000 400 0.0567928196 8 75 1166.6667 400 0.0414294193 8 76 1433.3333 400 0.0127875497 8 77 1700.0000 400 0.0105576089 8 78 1966.6667 400 0.0051503839 8 79 2233.3333 400 0.0035216836 8 80 2500.0000 400 0.0012326419 8 81 100.0000 600 0.0370072219 9 82 366.6667 600 0.0297765049 9 83 633.3333 600 0.0219866835 9 84 900.0000 600 0.0140510807 9 85 1166.6667 600 0.0021593963 9 86 1433.3333 600 0.0018936887 9 87 1700.0000 600 0.0017860546 9 88 1966.6667 600 0.0001551491 9 89 2233.3333 600 0.0001345905 9 90 2500.0000 600 0.0001048041 9 91 100.0000 800 0.7343220323 10 92 366.6667 800 0.1653557177 10 93 633.3333 800 0.1006331452 10 94 900.0000 800 0.0083407709 10 95 1166.6667 800 0.0043037301 10 96 1433.3333 800 0.0032461136 10 97 1700.0000 800 0.0015843809 10 98 1966.6667 800 0.0004819055 10 99 2233.3333 800 0.0002991639 10 100 2500.0000 800 0.0001447263 10 
+5
source share
1 answer

Perhaps this is what you need:

 # sample from distinct values of No my_groups <- df %>% select(No) %>% distinct %>% sample_n(5) # merge the two datasets my_df <- left_join(my_groups, df) 
+6
source

Source: https://habr.com/ru/post/1234763/


All Articles