Getting random results from large tables

I am trying to get 4 random results from a table containing about 7 million records. In addition, I also want to get 4 random entries from the same table, which are filtered by category.

Now, as you might expect random sorting on the table, this leads to the fact that the requests take a few seconds, which is not ideal.

Another method that I was thinking of a non-filtered result set would be to simply force PHP to select some random numbers from 1 to 7,000,000 or so, and then do IN(...) with the query only to capture these lines - and yes, I know that this method has the caveat that you can get less than 4 if the record with this identifier no longer exists.

However, the above method, obviously, will not work with categorical filtering, since PHP does not know which record numbers belong to which category and, therefore, cannot select the record numbers for selection.

Are there any better ways to do this? The only way I can think of is to save the record identifier for each category in another table, and then select random results from this, and then select only those record identifiers from the main table in the secondary query; but I'm sure there is a better way !?

+4
source share
4 answers

Of course, you can use the RAND() function to query using LIMIT and WHERE (for a category). This, as you indicated, entails a database scan that takes time, especially in your case due to the amount of data.

Your other alternative, again, as you indicated, for storing id / category_id in another table may be a little faster, but again there should be LIMIT and WHERE in this table, which will also contain the same amount of records as the main table.

Another approach (if applicable) should be to have a table for each category and store identifiers in it. If your categories are fixed or often do not change, you should be able to use this approach. In this case, you will effectively remove WHERE from the sentence, and getting RAND() with LIMIT in each category table will be faster, since each category table will contain a subset of the entries from your main table.

Some other alternatives are to use a key / value pair database for this operation only. MongoDb or Google AppEngine can help with this and really fast.

You can also move on to the master / slave approach in your MySQL. The slave replicates the content in real time, but when you need to complete an expensive request, you request the slave instead of the master, transferring the load to another machine.

Finally, you can go with Sphinx, which is much easier to install and maintain. You can then process each of these queries into categories as a document search and let Sphinx randomize the results. Thus, you take this expensive operation to another level and allow MySQL to continue other operations.

Just some issues to consider.

+2
source

Work with random number method

  • Get the maximum id in the database.
  • Create a temporary table to store matches.
  • The loop does the following n times
    • Create a random number between 1 and maxId
    • Get the first record with a record identifier greater than a random number and insert it into the tempo table
  • Your temporary table now contains your random results.

Or you can dynamically generate sql with a join to execute the query in one step.

  SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1 UNION SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1 UNION SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1 UNION SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1 

Note: my sql may be invalid, as I am not a member of mySql, but the theory should sound

+1
source

First you need to get the number of rows ... something like this

select count(1) from tbl where category = ? then select a random number

$offset = rand(1,$rowsNum); and select the line with an offset

select * FROM tbl LIMIT $offset, 1

this way you avoid skipping identifiers. The only problem is you need to execute the second request several times. The union can help in this case.

+1
source

For MySQl you can use

Rand ()

 SELECT column FROM table ORDER BY RAND() LIMIT 4 
-1
source

Source: https://habr.com/ru/post/1433426/


All Articles