Like many developers, I perform many data merge operations, particularly with SQL Server.
Historically, I used an old trick: -
1) Performing a left join with existing data and inserting something for which I do not have a corresponding record.
2) After 1), updating the rows in my target table.
I have to take a performance hit on 1). It's unavoidable. However, at 2), I was pretty wasteful. Instead of just updating the material that needs updating, I updated everything I matched (whether the underlying data changed or not).
It turns out that SQL Server is not very versed in this update. He does not perform a preliminary check to determine that what you are about to update is not the same as what you use to update it. Consequently, updates made on these lines result in a physical record and the effect of any indexes that reference the field.
So, from my POV, my options are as follows: -
1) Continue, as usual, bathing in the current wastefulness of my routine (and updating indexes daily on large databases)
- Pros: it's easy.
- Cons: this is shit.
2) Write more UPDATE statements that update a specific field if the field has been changed.
eg.
UPDATE p2 SET [SpecificField] = p1.[SpecificField] FROM @source p1, Dest p2 WHERE p2.ExternalKey = p1.ExternalKey AND COALESCE(p1.[SpecificField],'') <> COALESCE(p2.[SpecificField],'')
- Pros: it is very specific, it is updated only if necessary.
- Cons: many different update statements for tables with many columns.
3) Something infinitely better that the community offers.
I would love to go with 3). Are my options really limited to 1 or 2? The note. I looked at the MERGE INTO. The same problems, really.