SQL Distinct keyword swamps down performance?

Question

SQL Distinct keyword swamps down performance?

I received an SQL query that uses a specific keyword. When I tried to run the query, it took at least a minute to join the two tables with hundreds of thousands of records and actually return something.

Then I took out a distinct, and he returned after 0.2 seconds. Does the keyword really distinguish something bad?

EDIT: here is the request

 SELECT Distinct
 c.username, o.orderno, o.totalcredits, o.totalrefunds,
 o.recstatus, o.reason 

 from management.contacts c 
 join management.orders o
 on (c.custID = o.custID)
 where o.recDate> to_date ('2010-01-01', 'YYYY / MM / DD')

+4

sql plsql

Mxyl Mar 17 '11 at 15:29

source share

4 answers

Unlike me, disturbing calls always stand out - usually this means poor design of a table or a developer who is not self-confident. It is used to remove duplicate lines, but if the connections are correct, this is rarely needed. And yes, there are high costs of using it.

What is the primary key of the order table? Assuming that this should be enough to guarantee no duplicates. If this is something else, then you may need to do a little more with the request, but you should make it a goal to remove these differences !; -)

You also mentioned that the query was executed for a while, when you checked the number of rows - it is often easier to wrap the entire query in "select count (*) from ()", especially if you get a large number of rows returned. Just while you check .; -)

Finally, make sure you specify custID in the order table (and possibly recDate too).

+3

Trojanname Mar 17 '11 at 20:06

source share

The goal of DISTINCT is to trim duplicate records from the result set for all selected columns.

If any of the selected columns is unique after joining, you can delete DISTINCT.
If you do not know this, but know that the combination of values of the selected column is unique, you can delete DISTINCT.

In fact, as a rule, with correctly designed databases you rarely need DISTINCT, and in those cases when you do this, it is (?) Obvious that you need it. RDBMS, however, cannot leave it to its own devices and actually build an indexing structure to establish it.

Usually you find DISTINCT everywhere when people are not sure about JOINs and the relationship between tables.

Also, in classes, when it comes to pure relational databases, where the result should be the correct set (without duplicate elements = records), you may find it quite common for people to stick to DISTINCT to guarantee this property for theoretical correctness. Sometimes it penetrates production systems.

+2

Unreason Mar 17 '11 at 16:09

source share

You can try to create a group as follows:

  SELECT c.username, o.orderno, o.totalcredits, o.totalrefunds, o.recstatus, o.reason FROM management.contacts c, management.orders o WHERE c.custID = o.custID AND o.recDate > to_date('2010-01-01', 'YYYY-MM-DD') GROUP BY c.username, o.orderno, o.totalcredits, o.totalrefunds, o.recstatus, o.reason

Also check if there is a pointer to o.recDate

0

Alex peta Mar 17 '11 at 17:19

source share

Benoit · Accepted Answer · 2011-03-17T15:32:29+0000

Yes, since using DISTINCT will (sometimes in accordance with the comment) force the results to be ordered. Sorting hundreds of records takes time.

Try GROUP BY all your columns, sometimes it can lead the query optimizer to choose a more efficient algorithm (at least with Oracle, I noticed a significant increase in performance).

SQL Distinct keyword swamps down performance?

More articles: