Which is better - SELECT TOP (1) or INNER JOIN?

Question

Which is better - SELECT TOP (1) or INNER JOIN?

Let's say I have the following query:

SELECT Id, Name, ForeignKeyId, (SELECT TOP (1) FtName FROM ForeignTable WHERE FtId = ForeignKeyId) FROM Table

Will this query run faster if it is written using JOIN:

 SELECT Id, Name, ForeignKeyId, FtName FROM Table t LEFT OUTER JOIN ForeignTable ft ON ft.FtId = t.ForeignTableIf

Just curious ... also, if JOINs are faster, will it be faster in all cases (tables with lots of columns, lots of rows)?

EDIT: The queries I wrote are intended only to illustrate the concept of TOP (1) versus JOIN. Yes. I know about the query execution plan in SQL Server, but I do not want to optimize a single query. I am trying to understand if there is a specific theory for SELECT TOP (1) against JOIN and if a specific approach is preferable because of speed (not because of personal preferences or readability).

EDIT2: I would like to thank Aaron for his detailed answer and encourage people to check out his company SQL Sentry Plan Explorer the free tool that he mentioned in his answer.

+4

sql sql-server

kape123 Aug 29 '11 at 19:52

source share

3 answers

Your requests do different things. The first is more like a LEFT OUTER JOIN.

It depends on how your indexes are tuned for performance. But JOINs are more understandable.

0

Daniel A. White Aug 29 '11 at 19:54

source share

I agree with the statements above (Rick). Run this in terms of execution ... you'll get a clear answer. No speculation needed.
I agree with Daniel and David that these are two different SQL statements. If there are several records in the ForeignTable of the same FtId value, you will get data duplication. Assuming the 1st SQL statement is correct, you will have to rewrite the second with some GROUP BY clause.

0

joshgo Aug 29 '11 at 20:39

source share

Aaron bertrand · Accepted Answer · 2011-08-29T20:02:46+0000

I originally wrote:

The first version of the MUCH request is less readable to me. Especially since you are not trying to smooth out the matching column inside the correlated subquery. JOINs are much clearer.

I still believe and support these claims, but I would like to add to my original answer based on new information added to the question. You asked if there are general rules or theories about what works best, TOP (1) or JOIN, leaving aside readability and preference)? I will reinstall, as I commented that no, there are no general rules or theories. When you have a specific example, it is very easy to prove that it works better. Let's take these two queries, similar to yours, but which work against system objects that we can all check:

 -- query 1: SELECT name, (SELECT TOP (1) [object_id] FROM sys.all_sql_modules WHERE [object_id] = o.[object_id] ) FROM sys.all_objects AS o; -- query 2: SELECT o.name, m.[object_id] FROM sys.all_objects AS o LEFT OUTER JOIN sys.all_sql_modules AS m ON o.[object_id] = m.[object_id];

They return accurate results (3,179 rows on my system), but by that I mean the same data and the same number of rows. One clue that they do not look like the same query (or at least doesn't match the same execution plan) is that the results are returned in a different order. Although I did not expect any order to be maintained or respected because I did not enable ORDER BY anywhere, I would expect SQL Server to select the same order if they essentially use the same plan.

But this is not so. We see this by checking plans and comparing them. In this case, I will use SQL Sentry Plan Explorer , a free execution plan analysis tool from my company - you can get some of this information from Management Studio, but other parts are much easier to access in Plan Explorer (for example, actual duration and processor). The top plan is the version of the subquery, the bottom is the connection. Again, the subquery is at the top, the connection is at the bottom:

[ click for full size ]

Actual execution plans: 85% of the total cost of running two queries is in the subquery version. This means that it is more than 5 times more expensive than the compound. Both the processor and I / O are much higher with the subquery version — look at all these readings! 6,600 + pages to return ~ 3,000 rows, while the connection version returns data with much less I / O — only 110 pages.

But why? Since the subquery version works essentially like a scalar function that you go to and grab the matching TOP row from another table, but do it for each row of the original query. We see that the operation is performed 3,179 times, looking at the tab “Top operations”, which shows the number of executions for each operation. Once again, the more expensive version of the subquery is on top, and the connection version is as follows:

I will spare you a more thorough analysis, but in general, the optimizer knows what it does. Indicate your intention (the connection of this type between these tables) and in 99% of cases when it will work on its own, which is the best way to do this (for example, an implementation plan). If you try to eliminate the optimizer, keep in mind that you are going to a fairly developed territory.

There are exceptions to each rule, but in this particular case, the subquery is definitely bad. Does this mean that the proposed syntax in the first request is always a bad idea? Absolutely not. There may be unclear cases where the subquery version works as well as the connection. I cannot think that there is much where the subquery will work better. Therefore, I would be mistaken on the side of the one that is likely to be better or better and the one that is more readable. I don’t see the benefits for the subquery version, even if you consider it more readable, because it is likely to lead to worse performance.

In general, I highly recommend that you stick to a more readable, self-documenting syntax if you don't find a case where the optimizer doesn't do it right (and I would put in 99% of cases when the problem was bad statistics or the sniffing parameter, and not the query syntax) . I would suspect that outside of these cases, reproductions that you could reproduce, where intricate queries that work better than their more direct and logical equivalents, will be quite rare. Your motivation for looking for these cases should be about the same as your preference for a non-intuitive syntax over the generally accepted "best practice" syntax.

Which is better - SELECT TOP (1) or INNER JOIN?

More articles: