Avoiding Duplication of Key Exceptions

I ran into the following problem:

I am trying to save a table in sql server synchronized with several external databases. These external databases do not have a common unique primary key, so the local table has a simple integer PK.

Now, to update the local table, follow these steps:

  • External databases requested.
  • Data is converted to valid data for the local table.
  • Insert is used to try to write data to a local table.
  • If the insert returns a duplicate write exception, PK will search for a selection request and the data will be written to the table at the update request.
  • Another table is modified using a PK inserted or updated row.

Now it works great, but to me it seems very inefficient. In most cases, the data is already in the local table and leads to duplication of the key exception in the insert. This means a lot of exceptions that need to be handled, which is expensive. In addition, because PK is controlled by the database, a select query must be used to find the row to be updated.

How can I avoid this effect? I donโ€™t want to use the stored procedure, because I like to keep the request code-driven and included in version control.

I looked at the merger, but I saw too many people who reported problems with it.

I think I need to use the upsert form, but I'm not sure how to do this, since PK is managed by the database.

tl; dr: I need a query that will allow me to either insert or update a row (depending on the duplicate key or not), which will always return PK rows.

+5
source share
1 answer

I have an implementation that I made in the past that I like. You may or may not find this helpful.

Here's how it works ... I load both external and local data into memory using a model object that will work for both. For instance...

public class Person { public string FirstName { get; set; } public string LastName { get; set; } public string PhoneNumber { get; set; } public string Address { get; set; } // This comparer will be used to find records that exist or don't exist. public class KeyFieldComparer : IEqualityComparer<Person> { public bool Equals(Person p1, Person p2) { return p1.FirstName == p2.FirstName && p1.LastName == p2.LastName; } public int GetHashCode(Person p) { return p.FirstName.GetHashCode() ^ p.LastName.GetHashCode(); } } // This comparer will be used to find records that are outdated and need to be updated. public class OutdatedComparer : IEqualityComparer<Person> { public bool Equals(Person p1, Person p2) { return p1.FirstName == p2.FirstName && p1.LastName == p2.LastName && (p1.PhoneNumber != p2.PhoneNumber || p1.Address != p2.Address); } public int GetHashCode(Person p) { return p.FirstName.GetHashCode() ^ p.LastName.GetHashCode(); } } } 

We need to somehow uniquely identify the records that I believe you have. In this example, these are FirstName and LastName (I know that itโ€™s not very unique, but for simplicity let it pretend that it works well). IEqualityComparer<> will search for obsolete and new entries when loading lists into memory.

Now we just separate existing obsolete entries and new entries like this ...

 List<Person> local = loadLocalRecords(); List<Person> external = loadExternalRecords(); var newRecordsToInsert = external.Except(local, new Person.KeyFieldComparer()); var outdatedRecordsToUpdate = local.Intersect(external, new Person.OutdatedComparer()); 

Hope this makes sense. I can answer questions if you have them. The good thing about this method is that it does the job with the least amount of database access (I think). The bad news is that it should load everything into memory, which may be impractical for you. But the size of your table must be large in order to be a problem. Somewhere above several million records depending on the number of columns.

+2
source

Source: https://habr.com/ru/post/1205194/


All Articles