Better than the current query to build random categorized records?

I am trying to display exactly 6 random "entertaining" records, but with my current query it gets a random number from 1 to 6 and displays that number of records. How to update this query so that it displays exactly 6 random entertainment entries from my article table? Also, I don't want to do ORDER BY RAND (), because my table will become more overtime. Here is my current request:

SELECT r1.* FROM Articles AS r1 INNER JOIN (SELECT(RAND() * (SELECT MAX(id) FROM Articles)) AS id) AS r2 WHERE r1.id >= r2.id AND r1.category = 'entertainment' LIMIT 6; 

Table structure:

 table Articles - id (int) - category (varchar) - title (varchar) - image (varchar) - link (varchar) - Counter (int) - dateStamp (datetime) 
+5
source share
2 answers

WITH

 select floor(rand() * m.maxId + 1) as randomId from Articles a join (SELECT MAX(id) maxId FROM Articles) m limit 100 

you will create 100 random identifiers. I take 100 because you have spaces in the id column, so the probability of getting enough existing identifiers will be (very) small. Then you can use this result to select only 6 rows with these identifiers:

 select distinct a.* from ( select id, floor(rand() * m.maxId + 1) as randomId from Articles a join (SELECT MAX(id) maxId FROM Articles) m limit 100 ) r join Articles a on a.id = r.randomId order by r.id -- only need it for small tables. will slow down the query on big tables limit 6 

The best value for LIMIT in a subquery depends on the percentage of spaces in your identifiers. 100 should be fast enough.

Update

If you need to filter a category , you can add the WHERE a.category = 'entertainment' ORDER BY to ORDER BY and LIMIT . But in this case, you will need to fix the number of random identifiers generated.

For example: if you inserted 1M articles, but 10% of them were deleted, then on average there are 90 randomly generated identifiers. If now 10% of the articles have category = 'entertainment' , then the average of 9 random lines will meet the condition. Medium - this can be 3, and it can also be 16. Thus, you need to create more random identifiers to be sure that you get at least 6 articles. Using the LIMIT 1000 in a subquery, you get an average of 90 random entertaining articles. Thus, you are unlikely to get less than 6. Therefore, you need to know the statistics of your table in order to choose a good LIMIT .

Another issue with the WHERE is that MySQL can reorder to use the index for filtering. This may be faster for a small number of random identifiers generated, but may be slower if the LIMIT in the subquery is huge. You can force the join order by using STRIGHT_JOIN instead of JOIN . But in my test with LIMIT 10000 it did not make a measurable difference.

If your condition is too selective (for example, only 1% of articles have category='entertainment' ), a simple ORDER BY RAND() may be faster, because otherwise you will need to create too many random identifiers. But up to 10K rows matching your ORDER BY RAND() condition will be fast enough.

0
source

Your "entertainment" entries must have a unique identifier, which must be an integer.

If so, you can generate 6 random int values ​​between 1 and the number of records you use with the PHP rand () function. Here is some function I wrote that might be useful.

 function selectSixRandomEntries() { $queryWhere = ""; $i = 0; while($i < 6) { $randomNumber = rand(1, 200); if (strpos($queryWhere, $randomNumber) == -1) continue; $queryWhere .= "r1.id = " . rand(1, 200); if ($i != 5) $queryWhere .= " OR "; $i++; } return $queryWhere } 

And to use it, you can try

 $query = "SELECT r1.* FROM Articles AS r1 INNER JOIN (SELECT(RAND() * (SELECT MAX(id) FROM Articles)) AS id) AS r2 WHERE " . selectSixRandomEntries() . " AND r1.category = 'entertainment' LIMIT 6"; 
0
source

Source: https://habr.com/ru/post/1246279/


All Articles