Given that my previous answer was mentioned, I will try to explain again, because these things are quite complicated. So yes, I think you are seeing the same problem as another question . Namely, the problem with goals .
So, to try to explain what causes this, I will start with three types of associations that are at the disposal of the engine (and to a large extent categorically): Loop Joins, Merge Joins, Hash Joins. Loop connections are what they sound, nested loops over both datasets. Merge Joins accept two sorted lists and move through them to the lock. And Hash joins in to drop everything in the smaller set into the storage cabinet, and then search for items in the larger set after filling the filling cabinet.
Thus, performance is wise, combining loops requires virtually no tuning, and if you are looking for only a small amount of data, they are really optimal. Merging is the best of the best, as far as connection performance is for any data size, but requires that the data is already sorted (which is rare). Hash Joins require a sufficient number of settings, but they allow you to quickly connect large data sets.
Now we will receive your request and the difference between COUNT(*) and EXISTS/TOP 1 . Thus, the behavior that you see is that the optimizer considers that the rows of this query are really likely (you can confirm this by planning the query without grouping and seeing how many records it considers last). In particular, he probably believes that for some table in this query, each record in this table will be displayed on the output.
"Eureka!" he says: "If every row in this table ends with an output to find if it exists, I can make a really cheap start-up loop connection, because although it is slow for large datasets, I only need one row." But then he does not find this line. And does not find him again. And now it iterates through a huge dataset using the least efficient means for weeding large amounts of data.
For comparison, if you are requesting a complete data score, it should find each record by definition. He sees a huge data set and selects the choice that is best for repeating this entire data set, not just its tiny tape.
If, on the other hand, this was indeed correct, and the records were very well correlated, he would find your record with the minimum possible number of server resources and maximize its overall throughput.