Why is this random algorithm not biased

My colleague and I argue about why the shuffling algorithm in this list of JS tips and tricks does not give biased results such as sorting Jeff Atwood describes for naive shuffling.

The following example is an array of shuffle code:

list.sort(function() Math.random() - 0.5); 

Jeff naive shuffle code:

 for (int i = 0; i < cards.Length; i++) { int n = rand.Next(cards.Length); Swap(ref cards[i], ref cards[n]); } 

I wrote this JS to check shuffle:

 var list = [1,2,3]; var result = {123:0,132:0,321:0,213:0,231:0,312:0}; function shuffle() { return Math.random() - 0.5; } for (var i=0; i<60000000; i++) { result[ list.sort(shuffle).join('') ]++; } 

Why do I get the results (from Firefox 5):

  Order Count% Diff True Avg
 123 9997461 -0.0002539
 132 10003451 0.0003451
 213 10001507 0.0001507
 231 9997563 -0.0002437
 312 9995658-0.0004342
 321 10004360 0.000436

Presumably Array.sort goes through the list array and swaps (adjacent) elements, similar to Jeff's example. So why don't the results look biased?

+6
source share
3 answers

I found the reason it is being displayed .

Array.sort () not only returns the array, but the array itself. If we reinitialize the array for each loop, we get results such as:

 123 14941 132 7530 321 7377 213 15189 231 7455 312 7508 

This shows a very significant deviation.

For those who are interested, the code is changed here:

 var result = {123:0,132:0,321:0,213:0,231:0,312:0}; var iterations = 60000; function shuffle() { comparisons++; return Math.random() - 0.5; } for (var i=0; i<iterations; i++) { var list = [1,2,3]; result[ list.sort(shuffle).join('') ]++; } console.log(result); 
+2
source

The problem with a naive shuffle is that the value could already be replaced, and you can change it later. Say you have three cards, and you pick one random case for the first card. If later you can accidentally change this card last, then you select the randomness of this first choice.

If sorting is quicksort, it continuously splits the list by about half. The next iteration breaks each of these groups into two groups at random. This continues until you go to single cards, and then combine them all together. The difference is that you never take a card from a second randomly selected group and move it back to the first group.

A Knuth-Fisher-Yates shuffle is different from a naive shuffle because you only select a card once. If you were collecting random cards from the deck, would you put the card back and choose again? No, you take random cards one at a time. This is the first thing I heard about it, but I did something similar in high school, starting at index 0. KFY is probably faster because I have an extra complement in random expression.

 for (int i = 0; i < cards.Length - 1; i++) { int n = rand.Next(cards.Length - i) + i; // (i to cards.Length - 1) Swap(ref cards[i], ref cards[n]); } 

Do not think of it as an exchange, think of it by choosing random cards from the deck. For each element of the array (except the last one, because there is only one left), you select a random card from all the remaining cards and put it, forming a new stack of cards that are randomly shuffled. It does not matter that the remaining cards are no longer in the original order, if you have already done some swapping, you still choose one random card from all the remaining cards.

Random quick sorting is like taking a stack of cards and randomly splitting them into two groups, then selecting each group and randomly splitting them into two smaller groups, and so on, until you have separate cards and then back them up.

+1
source

Actually, this does not realize its naive random appearance. Its algorithm actually transfers the keys of the array manually, and sorting actively sorts the list.

sort uses quicksort or insertion sort (thanks to cwolves for indicating that this is - see comments) to do this (this will depend on the implementation):

  • More or less B? Smaller ones? Decrement.
  • More or less C? Smaller ones? Decrement.
  • More or less D? Smaller ones? Insert A after D
  • Is B greater or less than C? Smaller ones? Decrement.
  • Is B bigger or smaller than D? Smaller ones? Insert B after D and before A ...

This means that your big O for the middle case is O (n log n), and your big O for the worst case is O (n ^ 2) for each iteration of the loop.

Meanwhile, Atwood's naive random type is simple:

  • Start with A. Find a random value. Swap
  • Go to B. Find a random value. Swap
  • Go to C. Find a random value. Swap

(Knuth-Fisher-Yates is almost the same, only back)

So he has O (n) big for the worst case and O (n) big for the middle case.

0
source

Source: https://habr.com/ru/post/892273/


All Articles