Speedup in UPDATE temporary table

I have a SQL Server 2012 stored procedure. I populate the temp table below, and it's pretty simple. However, after that I do it over him UPDATE.

Here is my T-SQL for declaring a temp table #SourceTable, filling it out and then making some updates on it. After all this, I just take this temporary table and paste it into a new table, which we populate with the statement MERGEthat joins DOI. DOIis the main column here, and you will see below that my statements UPDATEget MAX/MINin multiple columns based on this column, since the table can have multiple rows with the same DOI.

My question is: how can I speed up the filling #SourceTableor make my updates on it? Are there any indexes I can create? I am decent in SQL, but not the best in terms of performance. I am dealing with perhaps 60,000,000 entries here in the temp table. He has been working for almost 4 hours. This is a one-time convention for the script. I run once.

CREATE TABLE #SourceTable
(
    DOI VARCHAR(72), 
    FullName NVARCHAR(128), LastName NVARCHAR(64), 
    FirstName NVARCHAR(64), FirstInitial NVARCHAR(10), 
    JournalId INT, JournalVolume VARCHAR(16), 
    JournalIssue VARCHAR(16), JournalFirstPage VARCHAR(16), 
    JournalLastPage VARCHAR(16), ArticleTitle NVARCHAR(1024), 
    PubYear SMALLINT, CreatedDate SMALLDATETIME, 
    UpdatedDate SMALLDATETIME, 
    ISSN_e VARCHAR(16), ISSN_p VARCHAR(16), 
    Citations INT, LastCitationRefresh SMALLDATETIME, 
    LastCitationRefreshValue SMALLINT, IsInSearch BIT, 
    BatchUpdatedDate SMALLDATETIME, LastIndexUpdate SMALLDATETIME, 
    ArticleClassificationId INT, ArticleClassificationUpdatedBy INT, 
    ArticleClassificationUpdatedDate SMALLDATETIME, 
    Affiliations VARCHAR(8000),
    --Calculated columns for use in importing...
    RowNum SMALLINT, MinCreatedDatePerDOI SMALLDATETIME, 
    MaxUpdatedDatePerDOI SMALLDATETIME, 
    MaxBatchUpdatedDatePerDOI SMALLDATETIME, 
    MaxArticleClassificationUpdatedByPerDOI INT, 
    MaxArticleClassificationUpdatedDatePerDOI SMALLDATETIME, 
    AffiliationsSameForAllDOI BIT, NewArticleId INT
)

--***************************************
--CROSSREF_ARTICLES
--***************************************
--GET RAW DATA INTO SOURCE TABLE TEMP TABLE..
INSERT INTO #SourceTable 
    SELECT 
        DOI, FullName, LastName, FirstName, FirstInitial, 
        JournalId, LEFT(JournalVolume,16) AS JournalVolume, 
        LEFT(JournalIssue,16) AS JournalIssue, 
        LEFT(JournalFirstPage,16) AS JournalFirstPage, 
        LEFT(JournalLastPage,16) AS JournalLastPage, 
        ArticleTitle, PubYear, CreatedDate, UpdatedDate, 
        ISSN_e, ISSN_p, 
        ISNULL(Citations,0) AS Citations, LastCitationRefresh, 
        LastCitationRefreshValue, IsInSearch, BatchUpdatedDate, 
        LastIndexUpdate, ArticleClassificationId, 
        ArticleClassificationUpdatedBy, 
        ArticleClassificationUpdatedDate, Affiliations,
        ROW_NUMBER() OVER(PARTITION BY DOI ORDER BY UpdatedDate DESC, CreatedDate ASC) AS RowNum, 
        NULL AS MinCreatedDatePerDOI, NULL AS MaxUpdatedDatePerDOI, 
        NULL AS MaxBatchUpdatedDatePerDOI, 
        NULL AS MaxArticleClassificationUpdatedByPerDOI, 
        NULL AS ArticleClassificationUpdatedDatePerDOI, 
        0 AS AffiliationsSameForAllDOI, NULL AS NewArticleId
    FROM 
        CrossRef_Articles WITH (NOLOCK)

--UPDATE SOURCETABLE WITH MAX/MIN/CALCULATED VALUES PER DOI...
UPDATE S
SET MaxUpdatedDatePerDOI = T.MaxUpdatedDatePerDOI, MaxBatchUpdatedDatePerDOI = T.MaxBatchUpdatedDatePerDOI, MinCreatedDatePerDOI = T.MinCreatedDatePerDOI, MaxArticleClassificationUpdatedByPerDOI = T.MaxArticleClassificationUpdatedByPerDOI, MaxArticleClassificationUpdatedDatePerDOI = T.MaxArticleClassificationUpdatedDatePerDOI
FROM #SourceTable S
INNER JOIN (SELECT MAX(UpdatedDate) AS MaxUpdatedDatePerDOI, MIN(CreatedDate) AS MinCreatedDatePerDOI, MAX(BatchUpdatedDate) AS MaxBatchUpdatedDatePerDOI, MAX(ArticleClassificationUpdatedBy) AS MaxArticleClassificationUpdatedByPerDOI, MAX(ArticleClassificationUpdatedDate) AS MaxArticleClassificationUpdatedDatePerDOI, DOI from #SourceTable GROUP BY DOI) AS T ON S.DOI = T.DOI
    UPDATE S
        SET AffiliationsSameForAllDOI = 1
        FROM #SourceTable S
        WHERE NOT EXISTS (SELECT 1 FROM #SourceTable S2 WHERE S2.DOI = S.DOI AND S2.Affiliations <> S.Affiliations)

After

+4
source share
3 answers

This will probably be a faster way to do the update - it's hard to say without seeing the execution plan, but it can run GROUP BY for each row.

with doigrouped AS
(
  SELECT
    MAX(UpdatedDate) AS MaxUpdatedDatePerDOI,
    MIN(CreatedDate) AS MinCreatedDatePerDOI,
    MAX(BatchUpdatedDate) AS MaxBatchUpdatedDatePerDOI, 
    MAX(ArticleClassificationUpdatedBy) AS MaxArticleClassificationUpdatedByPerDOI, 
    MAX(ArticleClassificationUpdatedDate) AS MaxArticleClassificationUpdatedDatePerDOI, 
    DOI 
  FROM #SourceTable 
  GROUP BY DOI
)
UPDATE S
SET MaxUpdatedDatePerDOI = T.MaxUpdatedDatePerDOI,
    MaxBatchUpdatedDatePerDOI = T.MaxBatchUpdatedDatePerDOI, 
    MinCreatedDatePerDOI = T.MinCreatedDatePerDOI, 
    MaxArticleClassificationUpdatedByPerDOI = T.MaxArticleClassificationUpdatedByPerDOI, 
    MaxArticleClassificationUpdatedDatePerDOI = T.MaxArticleClassificationUpdatedDatePerDOI
FROM #SourceTable S
INNER JOIN doigrouped T ON S.DOI = T.DOI

, , , 60 ... 100k , .

0

, :

  • INSERT SELECT INTO

#SourceTable. SELECT INTO ,

  1. UPDATE SELECT INTO

#SourceTable #SourceTable_Updates SELECT INTO ( Hogan):

with doigrouped AS
(
  SELECT
    MAX(UpdatedDate) AS MaxUpdatedDatePerDOI,
    MIN(CreatedDate) AS MinCreatedDatePerDOI,
    MAX(BatchUpdatedDate) AS MaxBatchUpdatedDatePerDOI, 
    MAX(ArticleClassificationUpdatedBy) AS MaxArticleClassificationUpdatedByPerDOI, 
    MAX(ArticleClassificationUpdatedDate) AS MaxArticleClassificationUpdatedDatePerDOI, 
    DOI 
  FROM #SourceTable 
  GROUP BY DOI
)
SELECT
    S.DOI,
    MaxUpdatedDatePerDOI = T.MaxUpdatedDatePerDOI,
    MaxBatchUpdatedDatePerDOI = T.MaxBatchUpdatedDatePerDOI, 
    MinCreatedDatePerDOI = T.MinCreatedDatePerDOI, 
    MaxArticleClassificationUpdatedByPerDOI = T.MaxArticleClassificationUpdatedByPerDOI, 
    MaxArticleClassificationUpdatedDatePerDOI = T.MaxArticleClassificationUpdatedDatePerDOI
INTO #SourceTable_Updates
FROM #SourceTable S
INNER JOIN doigrouped T ON S.DOI = T.DOI
  1. JOIN -ed #SourceTable #SourceTable_Updates

,

0

, insert

  • CrossRef_Articles ? ( ) , . temp Id. .
  • tempdb. , .
  • , , , ?
0

Source: https://habr.com/ru/post/1620832/


All Articles