Which solution is better for getting a random set of records from db?

I want to get some random entries from db. There are two solutions for this:

1- Using TABLESAMPLE to directly retrieve data from db.

2- Write a method in my application for this. In this method, we generate a plural random number and obtain the following data:

 select * from db where ID = @RandomNumber 

if this identifier does not exist, I transfer the new number.

Now who has the best performance?

+4
source share
4 answers

According to the documentation for TABESAMPLE you should not use it if you really want a sample of individual lines:

If you really need to randomly select individual rows, change your query to randomly select rows rather than using TABLESAMPLE. For example, the following query uses the NEWID function to return approximately one percent of the rows in the Sales.SalesOrderDetail table:

  SELECT * FROM Sales.SalesOrderDetail WHERE 0.01 >= CAST(CHECKSUM(NEWID(), SalesOrderID) & 0x7fffffff AS float) / CAST (0x7fffffff AS int) 

The SalesOrderID column is included in the CHECKSUM expression, so NEWID () evaluates once per row before providing fetch for each row. The expression CAST(CHECKSUM(NEWID(),> SalesOrderID) & 0x7fffffff AS float / CAST(0x7fffffff AS int) evaluates a random floating-point value from 0 to 1.

In any case, given the potentially infinite number of queries that you could make by going to @RandomNumber (theoretically, the first 1000 queries you make may not return anything), the best approach is to limit the result set on the server.

+3
source

try the following:

 SELECT TOP 1 * FROM db ORDER BY NEWID() 

the NewID function will generate a UniqueIdentifier value and will be random. Source: SQL to select a random row from a database table

+2
source

I would use TABLESAMPLE, as it generated sample data very easily. I expect this to be more efficient since you are invoking only one SQL fragment.

eg.

 USE AdventureWorks ; GO SELECT FirstName, LastName FROM Person.Contact TABLESAMPLE (10 PERCENT) 

In another example, you'll have to call select * from db where ID = @RandomNumber many times.

If you are after separate lines, I would use a different method, some random TOP 1, etc.

+1
source

I recommend reading the post various methods of getting a random row from a table . It is based on PostgreSQL, but I'm sure 90% applies to SQL Server.

Of course, the most flexible and efficient solution can be achieved by writing a stored procedure.

The cost (hence: better performance) of obtaining a truly random sample depends on the data (data type, statistics and distribution, including sparseness).

0
source

Source: https://habr.com/ru/post/1302845/


All Articles