JOIN or Correlated Existence Subquery Which is Better

select * from ContactInformation c where exists (select * from Department d where d.Id = c.DepartmentId ) select * from ContactInformation c inner join Department d on c.DepartmentId = d.Id 

Both queries produce the same conclusion, which is good in terms of performance with a wise join or a correlated subquery with an existence condition that is better.

Edit: - there is a way to access the Internet to increase productivity: In the above two queries, I want the information from the department, as well as contact information tables

+6
sql
Jul 22 '10 at 5:02
source share
4 answers

Typically, an EXISTS clause, because you might need a DISTINCT for a JOIN to give the expected result. For example, if you have multiple Department lines for a ContactInformation line.

In the above SELECT * example:

  • means a different conclusion, so they are actually not equivalent
  • less likely to use the index because you pull out all the columns

Saying that even with a limited list of columns they will give the same plan: until you need DISTINCT ... that's why I say "EXISTS"

+5
Jul 22 '10 at 5:08
source share
— -

You need to measure and compare - there is no golden rule that will be better - it depends on too many variables and things in your system.

In SQL Server Management Studio, you can put both queries in a window, select Include actual execution plan from the Query menu, and then run them together.

alt text http://i31.tinypic.com/2rw48s2.png

You should get a comparison of both your execution plans and the percentage of how much time was spent on a particular request. Most likely, both will be close to 50% in this case. If not, then you know which of the two queries works best.

You can learn more about SQL Server execution plans (and even download a free e-book) from Simple-Talk - highly recommended.

+4
Jul 22 '10 at 5:07
source share

I assume that either you wanted to add the DISTINCT keyword to the SELECT in the second query (or, less likely, the Department has only one Contact).

First, always start with "logical" considerations. The EXISTS construct is perhaps more intuitive, so all the physical ones are "equal", I would go with that.

Secondly, there will be one day when you need the ports of this code, not necessarily to another SQL product, but, say, the same product, but with a different optimizer. A decent optimizer must recognize that both are equivalent and come up with the same perfect plan. Suppose that, theoretically, the EXISTS design has slightly more potential for short circuiting.

Third, test it using a sufficiently large dataset. If performance is unacceptable, start looking at “physical” considerations (but I suggest you always keep your “logically clean” code in the comments for the upcoming day when the perfect optimizer comes :)

+2
Jul 22 '10 at 13:33
source share

The first query should output the columns of the department, and the second should not.

If you are interested in ContactInformation, these queries are equivalent. You can run both of them and study the query execution plan to find out which one is faster. For example, on MYSQL where exists more efficient with null columns, and inner join works better if none of the columns can be null.

+1
Jul 22 2018-10-10T00:
source share



All Articles