Recommended programming pattern for multiple searches

I was tasked with creating a process for synchronizing data between a CSV file created by another provider and more than 300 separately structurally identical CRM databases. All CRM databases are defined in a single instance of SQL Server. Here are the specifics:

The source data will be a CSV, which contains a list of all the email addresses in which customers have chosen marketing communications. This CSV file will be sent in full every night, but it will contain date and time stamps at the recording level, which will allow me to select only those records that have been changed since the last processing cycle. A CSV file potentially has many hundreds of thousands of lines, although the expected changes on a daily basis will be substantially lower.

I will select the data from the CSV and will convert each row into a user object List<T>.

As soon as the CSV is requested and the data is converted, I will need to compare the contents of this List<T>with the CRM databases. This is because any email address contained in a CSV file can:

  • Does not exist in any of the 300 databases.
  • Act in one of 300 databases
  • In several databases

In any case, when there is a match between the email address in the CSV master list and any CRM database, the corresponding CRM record will be updated with the values ​​contained in the CSV file.

At a high, very general level, I thought I would need to do something like this:

foreach(string dbName in masterDatabaseList)
{
    //open db connection

    foreach(string emailAddress in masterEmailList)
    {
        //some helper method that would execute a SQL statement like
        //"IF EXISTS ... WHERE EMAIL_ADDRESS = <emailAddress>" return true;

        bool matchFound = EmailExistsInDb(emailAddress)

        if (matchFound )
        {
            //the current email from the master list does exist in this database
            //do necessary updates and stuff
        }
    }
}

? 300 , , CSV. SQL :

"SELECT * FROM EMAIL_TABLE WHERE EMAIL_ADDRESS IN(email1,email2, email3,...)"

, , /, , SQL .

? 300 , , , . , , .

+4
2

, , . CSV . where in LINQ, :

var addresses = GetEmailAddresses();
var entries = ctx.Entries.Where(e => addresses.Contains(e.EmailAddress));

, , . (200 ?), .

, , , :

  • .
  • parallelism.
  • , .
  • , . , db .
+1

CSV . , TVP. 300 ( ad-hoc sql). , 300 , . - :

CREATE PROCEDURE yourNewProcedure
(
    @TableValueParameter dbo.udtTVP READONLY
)
AS

DECLARE @dbName varchar(255)
DECLARE @SQL nvarchar(3000)

DECLARE DB_Cursor CURSOR LOCAL FOR
    SELECT DISTINCT name
    FROM sys.databases
    WHERE Name like '%yourdbs%'
OPEN DB_Cursor
FETCH NEXT FROM DB_Cursor INTO @dbName
WHILE @@FETCH_STATUS  = 0
BEGIN
    SET @SQL = 'UPDATE t
                SET t2.Field = t.Field              
                FROM @TableValueParameter t
                JOIN [' + @dbName + ']..TableYouCareAbout t2 ON t.Field = t2.Field '

    EXEC sp_executesql @SQL, N'@TableValueParameter dbo.udtTVP', @TableValueParamete

    FETCH NEXT FROM DB_Cursor INTO @dbName
END
CLOSE DB_Cursor
DEALLOCATE DB_Cursor
0

Source: https://habr.com/ru/post/1544982/


All Articles