Search for least used permutation

I need to distribute the data set evenly over time based on historical data so that each digit becomes equal (or close to one) the number of times in each position over time. The problem is that this list of orders used in the past looks like this (but can have any number of elements):

1,2,5,3,4 4,1,5,2,3 1,3,5,2,4 4,1,2,3,5 2,4,1,3,5 5,1,4,3,2 1,5,3,2,4 5,1,3,2,4 3,2,5,4,1 4,3,1,5,2 

how can I find the order of the smallest values ​​and lead to a “more balanced" set of orders. The obvious answer is that I could group and count them and select the least used, but the problem in the least used permutation may never have been used, for example here, the order of "1,2,3,4,5" is a candidate for least used because it does not appear at all.

The simple answer seems to be to determine which position "1" appears in the least frequent and sets this position to "1", etc. for each digit. I suspect this works, but I feel that there is a more elegant solution that I have not considered potentially with cross-connections to include all possible combinations.

any ideas?

+6
source share
3 answers

What you have here is the problem of aligning the histogram.

Consider the problem from this point of view: you have a set of N histograms that represent the frequency of occurrence of N values ​​in the discrete range {1..N}. What you want to do is add a new set of values ​​to your data set that move all the histograms closer to the level. Given the nature of your problem, we know that each value will generally appear as many times as any other value.

One way to do this is to find which N values ​​have the lowest occurrence frequency at any position and assign that position to it. Then, on the remaining histograms, find the next value with the lowest frequency of occurrence in any position and assign this value to this position. Continue repeating this process until all values ​​are assigned to a unique position. This gives you the following set of values. Now you can repeat this operation iteratively to continue to generate new sets of values ​​that will try to rebalance the distribution of values ​​with each iteration.

If you save histograms when spreading values, this becomes a relatively efficient operation (you do not need to constantly rescan the data set).

Keep in mind, however, that for any fairly small set of values, you will always be “out of balance” to some extent. There is no way around this.

+1
source

I assume that you have a way to generate a random permutation (for example, The most efficient way to randomly “shuffle” a list of integers in C # ), Given that one proposal to create one new ordering looks like this:

1) Create two random substitutions

2) Keep depending on which one even withstands the imbalance.

One of the measures of balance would be to think of a list of all samples of discharge frequencies in each position as a vector, which, in the case of an ideal balance, would have the same element. The imbalance will then be the length of the vector that you get by subtracting this ideal vector. Choosing between two random permutations, you choose a permutation from the distribution, the midpoints of which are in the opposite direction to the current imbalance, so you should strive to correct it, while maintaining an arbitrary sequence of permutations.

0
source

If the total number of combinations is small enough, then the approach that I used on a similar problem for a long time:

Maintain a pool of options that are periodically replenished.

In your example, you have 120 possible permutations. Create an array of 120 elements, assign each an initial value equal to 5. If you need a random value that you select from this pool, the number in the hopper will be the weight indicated in this hopper. (At the beginning, the number of bins is added up to 600. Select a random value from 1 to 600, subtract the bins from it until <= 0. The bin you selected is your result.) When the record is selected, decrease this bit by one. After you have made 120 draws from the heap, add 1 to each drawer.

Obviously, this becomes impractical if the total number of possibilities is too large.

0
source

Source: https://habr.com/ru/post/897835/


All Articles