Retrieving failed jobs from a table with retry details (id and number of retries)

Question

Retrieving failed jobs from a table with retry details (id and number of retries)

My apologies for the unintuitive topic title.

I have a Jobs table where each row represents a maintenance task performed by a computer program. It has the following construction:

 CREATE TABLE Jobs ( JobId bigint PRIMARY KEY, ... Status int NOT NULL, OriginalJobId bigint NULL )

When the task is created / launched, its row is added to the table, and its status is 0 . When the task is completed, its status is updated to 1 , and when the task is not completed, its status is updated to 2 . When a job fails, the job manager will repeat the job, inserting a new row into the Jobs table, duplicating the details of the failed job and reset Status to 0 and using the original (failed) JobId in OriginalJobId for tracking purposes. If this retry failed, it should be repeated up to 3 times, each subsequent JobId will contain the original JobId in the OriginalJobId column.

My problem is trying to formulate a query in order to get the current set of jobs that failed and get their number of attempts.

Here is an example of the data in the table:

 JobId | Status | OriginalJobId 1, 1, NULL -- Successful initial job 2, 0, NULL -- Pending initial job 3, 2, NULL -- Failed initial job 4, 1, 3 -- Successful retry of Job 3 5, 2, NULL -- Failed initial job 6, 2, 5 -- Failed retry 1 of Job 5 7, 2, 5 -- Failed retry 2 of Job 5 -- should be tried again for 1 more time 8, 2, NULL -- Failed initial job 9, 2, 8 -- Failed retry 1 of Job 8 10, 2, 8 -- Failed retry 2 of Job 8 11, 2, 8 -- Failed retry 3 of Job 8 -- don't try again 12, 2, NULL -- Failed initial job

My request should return this:

  JobId | RetryCount 5, 2 12, 0

Please note that Job 3 not enabled because its most recent retry was successful (status 1 ). Similarly, Job 8 excluded because the number of attempts exceeds the limit of 3. Task 5 turned on because it is still not completed and has only 2 retries, and Job 12 on and has not yet tried.

I think the solution would be something like this:

 SELECT J1.JobId FROM Jobs AS J1 LEFT OUTER JOIN Jobs AS J2 ON J1.JobId = J2.OriginalJobId WHERE J1.Status = 2

... but I can’t figure out how to get RetryCount data.

Here is the SQLFiddle I created for this problem, with one of the following solutions:

http://sqlfiddle.com/#!6/8765f

Update

Here is an updated SQLFiddle that compares 5 solutions provided so far (I added an additional HAVING to remove jobs that had more than 3 attempts)

http://sqlfiddle.com/#!6/8765f/23

In terms of performance, I think GarethD's answer is the best, as it has the simplest execution plan and tends to end with the fastest time in SqlFiddle.

My production table has about 14,000,000 rows, so obviously the results will be different. I will try everyone in production and see which one is the fastest, and select the answer accordingly.

Thank you all for your help!

+6

sql sql-server

Dai Nov 21 '14 at 9:52

source share

5 answers

That should do the job. It COALESCE combines JobId and OriginalJobId , gets a repeat count, grouping them, then excluding any jobs that have status 1.

 SELECT COALESCE(j.OriginalJobId, j.JobId) JobId, COUNT(*)-1 RetryCount FROM Jobs j WHERE j.[Status] = 2 AND NOT EXISTS (SELECT 1 FROM Jobs WHERE COALESCE(Jobs.OriginalJobId, Jobs.JobId) = COALESCE(j.OriginalJobId, j.JobId) AND Jobs.[Status] = 1) GROUP BY COALESCE(j.OriginalJobId, j.JobId), j.[Status]

+3

DavidG Nov 21 '14 at 10:09

source share

Here's a slightly more detailed CTE approach that I wrote that returns results, including tasks where the initial task status = 2 is executed and there are no repetitions:

 ;WITH cte AS ( -- root level jobs that failed and did not have status of 1 after SELECT j.JobId , j.OriginalJobId , 0 AS RetryCount FROM dbo.Jobs j WHERE j.OriginalJobId IS NULL AND j.Status = 2 AND NOT EXISTS ( SELECT OriginalJobId FROM dbo.Jobs WHERE Status = 1 AND OriginalJobId = j.JobId ) -- unioned with retries UNION ALL SELECT j.JobId , j.OriginalJobId , 1 AS RetryCount FROM dbo.Jobs j INNER JOIN cte ON cte.JobId = j.OriginalJobId ) -- Group Jobs & Count retries SELECT JobId , SUM(RetryCount) Retries FROM ( SELECT JobId , cte.RetryCount FROM cte WHERE OriginalJobId IS NULL UNION ALL SELECT OriginalJobId AS JobId , cte.RetryCount FROM cte WHERE OriginalJobId IS NOT NULL ) t GROUP BY JobId

+2

Tanner Nov 21 '14 at 10:25

source share

How about this look ma! No! Decision:

 select coalesce(OriginalJobId, JobId) JobId, count(OriginalJobId) RetryCount from Jobs group by coalesce(OriginalJobId, JobId) having count(case status when 1 then 1 end) = 0 and max(status) > 0 order by JobId;

Returns the desired result:

  JobId | RetryCount 6, 3 15, 0

+2

rozmarin Nov 21 '14 at 16:51

source share

Why do we need to make a connection, since the only thing we want is count OriginalJoibId, not having "1"?

 SELECT OriginalJobId, COUNT(*) As RetryCount FROM Jobs WHERE OriginalJobId IS NOT NULL GROUP BY OriginalJobId HAVING COUNT(CASE WHEN Status = 1 THEN 1 END) = 0

I think we can just ignore all entries that are NULL in OriginalJobId and focus only on redesign.

EDIT:

I did not notice that the second entry was added to the required result when I wrote my answer. The best I can do to fix this is the following pretty ugly construct: =)

 SELECT OriginalJobId, COUNT(*) As RetryCount FROM Jobs WHERE OriginalJobId IS NOT NULL GROUP BY OriginalJobId HAVING COUNT(CASE WHEN Status = 1 THEN 1 END) = 0 UNION ALL SELECT j.JobId, 0 FROM Jobs j WHERE (Status = 2) AND (OriginalJobId IS NULL) AND (NOT EXISTS (SELECT 1 FROM Jobs WHERE OriginalJobId = j.JobId))

0

Giorgos betsos Nov 21 '14 at 11:26

source share

GarethD · Accepted Answer · 2014-11-21T10:10:16+0000

Next, the required result is returned:

 SELECT J1.JobId, Retries = COUNT(J2.JobId) FROM Jobs AS J1 INNER JOIN Jobs AS J2 ON J1.JobId = J2.OriginalJobId WHERE J1.Status = 2 GROUP BY J1.JobId HAVING COUNT(CASE WHEN J2.Status = 1 THEN 1 END) = 0;

I changed it to an INNER connection, so that only jobs that were repeated are included, although this could rightfully be replaced by a LEFT connection to include failed jobs that have not yet been repeated. I also added a HAVING to exclude any jobs that didn't work when they were re-executed.

EDIT

As mentioned above, using an INNER JOIN will mean that you only return jobs that were repeated to get all the failed jobs that you need to use the LEFT JOIN , this will mean that retries are returned as failed jobs, so I added optional J1.OriginalJobId IS NULL predicate to return only the original jobs:

 SELECT J1.JobId, Retries = COUNT(J2.JobId) FROM Jobs AS J1 LEFT JOIN Jobs AS J2 ON J1.JobId = J2.OriginalJobId WHERE J1.Status = 2 AND J1.OriginalJobId IS NULL GROUP BY J1.JobId HAVING COUNT(CASE WHEN J2.Status = 1 THEN 1 END) = 0;

SQL script example

Retrieving failed jobs from a table with retry details (id and number of retries)

Update

More articles: