Create N random integers without spaces

Question

Create N random integers without spaces

For the clustering algorithm that I am implementing, I would like to initialize cluster assignments at random. However, I need to avoid spaces. That is, this is not normal:

set.seed(2) K <- 10 # initial number of clusters N <- 20 # number of data points z_init <- sample(K,N, replace=TRUE) # initial assignments z_init # [1] 2 8 6 2 10 10 2 9 5 6 6 3 8 2 5 9 10 3 5 1 sort(unique(z_init)) # [1] 1 2 3 5 6 8 9 10

where labels 4 and 7 were not used.

Instead, I would like this vector to be:

 # [1] 2 6 5 2 8 8 2 7 4 5 5 3 6 2 4 7 8 3 4 1

where label 5 has become equal to 4 and so on to fill the bottom empty labels.

Other examples:

Vector 1 2 3 5 6 8 must be ̀1 2 3 4 5 6 7
Vector 15,5,7,7,10 must be ̀1 2 3 3 4

Is it possible to avoid for loops? I don’t need it to be fast, I prefer it to be elegant and short, as I only do it once in the code (to initialize the label).

My solution using for loop

 z_init <- c(3,2,1,3,3,7,9) idx <- order(z_init) for (i in 2:length(z_init)){ if(z_init[idx[i]] > z_init[idx[i-1]]){ z_init[idx[i]] <- z_init[idx[i-1]]+1 } else{ z_init[idx[i]] <- z_init[idx[i-1]] } } z_init # 3 2 1 3 3 4 5

+5

r

alberto Feb 01 '16 at 21:57

source share

4 answers

It seems to me that you are trying to randomly assign the elements of the set (numbers from 1 to 20) to the clusters, provided that at least one element is assigned to each cluster.

One approach I could think of is to choose a random reward r_ij for assigning element i to cluster j . Then I would define binary decision variables x_ij that indicate whether element i assigned to cluster j . Finally, I would use a mixed whole optimization to select an assignment from elements to clusters, which maximizes the collected reward under the following conditions:

Each element is assigned exactly one cluster.
Each cluster has at least one member assigned to it.

This is equivalent to randomly choosing a destination, saving it if all the clusters have at least one member and otherwise drop it and try again until you get the correct random assignment.

From an implementation point of view, this is pretty easy to do in R using the lpSolve package:

 library(lpSolve) N <- 20 K <- 10 set.seed(144) r <- matrix(rnorm(N*K), N, K) mod <- lp(direction = "max", objective.in = as.vector(r), const.mat = rbind(t(sapply(1:K, function(j) rep((1:K == j) * 1, each=N))), t(sapply(1:N, function(i) rep((1:N == i) * 1, K)))), const.dir = c(rep(">=", K), rep("=", N)), const.rhs = rep(1, N+K), all.bin = TRUE) (assignments <- apply(matrix(mod$solution, nrow=N), 1, function(x) which(x > 0.999))) # [1] 6 5 3 3 5 6 6 9 2 1 3 4 7 6 10 2 10 6 6 8 sort(unique(assignments)) # [1] 1 2 3 4 5 6 7 8 9 10

+3

josliber Feb 01 '16 at 22:12

source share

You can do the following:

 un <- sort(unique(z_init)) (z <- unname(setNames(1:length(un), un)[as.character(z_init)])) # [1] 2 6 5 2 8 8 2 7 4 5 5 3 6 2 4 7 8 3 4 1 sort(unique(z)) # [1] 1 2 3 4 5 6 7 8

Here I replace the un elements in z_init with the corresponding 1:length(un) elements.

+3

Julius Feb 01 '16 at 22:18

source share

A simple (but possibly inefficient) approach is to convert to a coefficient, and then back to a numerical one. Creating a factor will encode information as integers from 1 to the number of unique values, and then add labels with the original values. Converting to a numerical value then resets the labels and leaves the numbers:

 > x <- c(1,2,3,5,6,8) > (x2 <- as.numeric(factor(x))) [1] 1 2 3 4 5 6 > > xx <- c(15,5,7,7,10) > (xx2 <- as.numeric(factor(xx))) [1] 4 1 2 2 3 > (xx3 <- as.numeric(factor(xx, levels=unique(xx)))) [1] 1 2 3 3 4

The levels = part in the last example sets the numbers corresponding to the order in which they are displayed in the original vector.

+3

Greg snow Feb 01 '16 at 10:48

source share

Laterow · Accepted Answer · 2016-02-01T22:56:16+0000

Edit : @GregSnow came up with the shortest answer. I am 100% convinced that this is the shortest way.

For pleasure, I decided to golf code, i.e. write it as short as possible:

 z <- c(3, 8, 4, 4, 8, 2, 3, 9, 5, 1, 4) # solution by hand: 1 2 3 3 4 4 4 5 6 6 7 sort(c(factor(z))) # 18 bits, as proposed by @GregSnow in the comments # [1] 1 2 3 3 4 4 4 5 6 6 7

Some other (functioning) attempts:

 y=table(z);rep(seq(y),y) # 24 bits sort(unclass(factor(z))) # 24 bits, based on @GregSnow answer diffinv(diff(sort(z))>0)+1 # 26 bits sort(as.numeric(factor(z))) # 27 bits, @GregSnow original answer rep(seq(unique(z)),table(z)) # 28 bits cumsum(c(1,diff(sort(z))>0)) # 28 bits y=rle(sort(z))$l;rep(seq(y),y) # 30 bits

Edit2 : just to show that the bit is not everything:

 z <- sample(1:10,10000,replace=T) Unit: microseconds expr min lq mean median uq max neval sort(c(factor(z))) 2550.128 2572.2340 2681.4950 2646.6460 2729.7425 3140.288 100 { y = table(z) rep(seq(y), y) } 2436.438 2485.3885 2580.9861 2556.4440 2618.4215 3070.812 100 sort(unclass(factor(z))) 2535.127 2578.9450 2654.7463 2623.9470 2708.6230 3167.922 100 diffinv(diff(sort(z)) > 0) + 1 551.871 572.2000 628.6268 626.0845 666.3495 940.311 100 sort(as.numeric(factor(z))) 2603.814 2672.3050 2762.2030 2717.5050 2790.7320 3558.336 100 rep(seq(unique(z)), table(z)) 2541.049 2586.0505 2733.5200 2674.0815 2760.7305 5765.815 100 cumsum(c(1, diff(sort(z)) > 0)) 530.159 545.5545 602.1348 592.3325 632.0060 844.385 100 { y = rle(sort(z))$l rep(seq(y), y) } 661.218 684.3115 727.4502 724.1820 758.3280 857.412 100 z <- sample(1:100000,replace=T) Unit: milliseconds expr min lq mean median uq max neval sort(c(factor(z))) 84.501189 87.227377 92.13182 89.733291 94.16700 150.08327 100 { y = table(z) rep(seq(y), y) } 78.951701 82.102845 85.54975 83.935108 87.70365 106.05766 100 sort(unclass(factor(z))) 84.958711 87.273366 90.84612 89.317415 91.85155 121.99082 100 diffinv(diff(sort(z)) > 0) + 1 9.784041 9.963853 10.37807 10.090965 10.34381 17.26034 100 sort(as.numeric(factor(z))) 85.917969 88.660145 93.42664 91.542263 95.53720 118.44512 100 rep(seq(unique(z)), table(z)) 86.568528 88.300325 93.01369 90.577281 94.74137 118.03852 100 cumsum(c(1, diff(sort(z)) > 0)) 9.680615 9.834175 10.11518 9.963261 10.16735 14.40427 100 { y = rle(sort(z))$l rep(seq(y), y) } 12.842614 13.033085 14.73063 13.294019 13.66371 133.16243 100

Create N random integers without spaces

More articles: