I am trying to find a way to get a random selection from a large dataset.
We expect the collection to grow to ~ 500K records, so itโs important to find a way that will work well while the collection grows.
I tried the technique from: http://forums.mysql.com/read.php?24,163940,262235#msg-262235 But this is not entirely random, and it does not work very well with the sentence LIMIT, you do not always get the number of records, which you want.
So I thought that since PK is auto_increment, I just create a list of random identifiers and use the IN clause to select the strings I want. The problem with this approach is that sometimes I need a random data set with records that have spefic status, a status that is no more than 5% of the total set. To do this work, I first need to figure out what type of ID I can use in order to have this particular status so that it does not work.
I am using mysql 5.1.46, the storage engine of MyISAM.
It may be important to know that a query to select random rows will be executed very often, and the table that it selects is added frequently.
Any help would be greatly appreciated!