SQL: speed improvement - left Join cond1 or cond2

Question

SQL: speed improvement - left Join cond1 or cond2

SELECT DISTINCT a.*, b.* FROM current_tbl a LEFT JOIN import_tbl b ON ( a.user_id = b.user_id OR ( a.f_name||' '||a.l_name = b.f_name||' '||b.l_name) )

Two tables that are basically the same
I do not have access to the table structure or data entry (this way we do not clear the primary keys)
Sometimes user_id is populated by one rather than the other
Sometimes names are equal, sometimes they are not

I found that I can get the most out of the data by matching them with user_id or names / names. I use ' ' between names to avoid cases when one user has the same name as the other last name and both do not have another field (unlikely, but plausible).

This request works in 33000 ms, while individualized, each of them is about 200 ms.

I'm late and can't think right now.
I think I could do UNION and only query by name where user_id does not exist (the default union is user_id, if user_id does not exist, then I want to join the name)
Here are some free points for those who want to help.

Please do not request an implementation plan.

+4

performance sql join left-join

vol7ron Feb 16 '11 at 15:52

source share

8 answers

It looks like you can easily avoid string concatenation:

 OR ( a.f_name||' '||a.l_name = b.f_name||' '||b.l_name)

Change it to:

 OR ( a.f_name = b.f_name AND a.l_name = b.l_name)

+4

Andomar Feb 16 '11 at 15:56

source share

Instead of concatenating your first and last name and comparing them, try comparing them separately. Assuming you have them (and you should create them if you don't), this should improve your chances of using indexes for the first and last name columns.

 SELECT DISTINCT a.*, b.* FROM current_tbl a LEFT JOIN import_tbl b ON ( a.user_id = b.user_id OR (a.f_name = b.f_name and a.l_name = b.l_name) )

+4

Joe stefanelli Feb 16 '11 at 15:57

source share

I do not understand why you are combining these lines. Looks like where there will be a slowdown. It works?

 SELECT DISTINCT a.*, b.* FROM current_tbl a LEFT JOIN import_tbl b ON ( a.user_id = b.user_id OR ( a.f_name = b.f_name AND a.l_name = b.l_name) )

+1

Nathan dewitt Feb 16 '11 at 16:01

source share

Here is another ugly way to do this.

 SELECT a.* , CASE WHEN b.user_id IS NULL THEN c.field1 ELSE b.field1 END as b_field1 , CASE WHEN b.user_id IS NULL THEN c.field2 ELSE b.field2 END as b_field2 ... FROM current_tbl a LEFT JOIN import_tbl b ON a.user_id = b.user_id LEFT JOIN import_tbl c ON a.f_name = c.f_name AND a.l_name = c.l_name;

This avoids any GROUP BY, and also handles conflicting matches in a somewhat reasonable way.

0

btilly Feb 16 '11 at 16:51

source share

Try using JOIN hints:

http://msdn.microsoft.com/en-us/library/ms173815.aspx

We encountered the same type of behavior with one of our requests. As a last resort, we added a LOOP hint, and the query was much faster.

It is important to note that Microsoft is talking about JOIN hints:

Since SQL Server Query Optimizer usually chooses the best query execution plan, we recommend that you use it only as a last resort by experienced developers and database administrators.

0

Dashtechnical Feb 16 '11 at 17:29

source share

my boss at my last job .. i swear .. he thought using UNIONS was ALWAYS BETTER OR.

For example, instead of writing

Select * from employees where Employee_id = 12 or employee_id = 47

he will write (and write to me)

Select * from employees Where employee_id = 12 UNION Select * from employees Where employee_id = 47

SQL Sever Optimizer said it was the right solution in some situations. I have a friend who works in a SQL Server team at Microsoft, I emailed him about this, and he told me that my statistics are out of date or something like that.

I never got a good answer about why unions are faster, it seems REALLY controversial.

I do not recommend you do this, but in some situations this may help.

0

Aaron kempf Mar 01 '11 at 9:18

source share

Also two more things - GET OPENING A REFUSAL if you don't need it. n

and, more importantly, you can easily get rid of concatenation in your connection, for example, for example (I apologize for the lack of knowledge of mySQL)

SELECT DISTINCT a., B. FROM current_tbl a LEFT JOIN import_tbl b ON (a.user_id = b.user_id OR (a.f_name = b.f_name and a.l_name = b.l_name))

I had some tests at work in a similar situation that show a 10x performance improvement, getting rid of the simple concatenation in your connection

0

Aaron kempf Mar 01 '11 at 9:22

source share

btilly · Accepted Answer · 2011-02-16T16:21:16+0000

If people's suggestions do not provide a significant increase in speed, there is a possibility that your real problem is that the best query plan for the two possible connection conditions is different. For this situation, you would like to make two queries and merge the results in some way. This is likely to make your request much, much uglier.

One obscure trick I used for this kind of situation is to do GROUP BY without querying UNION ALL. The idea looks like this:

 SELECT a_field1, a_field2, ... MAX(b_field1) as b_field1, MAX(b_field2) as b_field2, ... FROM ( SELECT a.field_1 as a_field1, ..., b.field1 as b_field1, ... FROM current_tbl a LEFT JOIN import_tbl b ON a.user_id = b.user_id UNION ALL SELECT a.field_1 as a_field1, ..., b.field1 as b_field1, ... FROM current_tbl a LEFT JOIN import_tbl b ON a.f_name = b.f_name AND a.l_name = b.l_name ) GROUP BY a_field1, a_field2, ...

And now the database can perform each of the two connections using the most efficient plan.

(A warning about the flaw in this approach. If a row in current_tbl joins multiple rows in import_tbl, you can merge the data in a very strange way.)

Random random performance recall. If you have no reason to believe that potential duplicate lines exist, avoid DISTINCT. This results in an implicit GROUP BY, which can be expensive.

SQL: speed improvement - left Join cond1 or cond2

More articles: