I am currently working on an SSIS package that retrieves a table from one database to another. The tables in both databases use the same column as the primary key. My select statement to retrieve data is a simple select statement. When I launched the package, I received a message saying that where the primary key values are repeated.
I reviewed my select statement and confirmed that my select statement did not return duplicate rows. Thus, to verify this, I deleted the primary key from the table into which I insert the data and reinstall the SSIS package. After that, I looked at the table to see which lines were duplicated. What I discovered are strings that when edited during the execution of the extent, when they are duplicated, there was a record before editing, and a record after editing. I could easily say this because the table has the last changed field, which is updated every time the record is updated.
I added a NOLOCK hint to my select statement and stopped returning duplicate rows.
So why my question? I would expect that a select statement with a NOLOCK table hint would have a higher chance of returning duplicate rows because it does not use a lock, and that a select statement without a NOLOCK hint should use a lock to make sure that it does not return a duplicate row.
Here is the select statement that I use to select data. I really checked that joins do not duplicate strings:
SELECT pe.enc_id,
pe.enc_nbr,
pe.billable_ind,
pe.clinical_ind AS clinical_ind,
pe.budget_ind,
pe.print_stmt_ind,
pe.send_coll_letter_ind,
pe.outsource_exempt_ind,
cb.First_name + ' ' + cb.last_name AS CreatedBy,
pe.create_timestamp AS create_timestamp,
mb.first_name + ' ' + mb.last_name AS ModifiedBy,
pe.modify_timestamp AS modify_timestamp
FROM patient_encounter pe WITH(NOLOCK)
LEFT OUTER JOIN user_mstr cb WITH(NOLOCK) ON
pe.created_by = cb.user_id
LEFT OUTER JOIN user_mstr mb WITH(NOLOCK) ON
pe.modified_by = mb.user_id
source
share