USQL Nesting TVFs and Queries Delivers Terrific Results

I think this problem is related to the query optimization that Azure Data Lake Analytics does; but let's see ...

I have 2 separate queries (TVF) that do aggregation and then the final query to join 2 together for final results. So...

Table >  Header Query
Table >  Detail Query
Result = Header Query + Detail Query

To check all the logic, I run the secondary queries separately with a filter, saving the results to a file, and then use the hard files as sources for the final query; this is the total duration (minutes).

Header Query  1.4  (408 rows)
Detail Query  0.9  (3298 rows)
Final Query   0.9  (408 rows)

Therefore, I know that the maximum, I can get my result in about 3.5 minutes. However, I do not want to create new intermediary files. I want to use TDF directly to submit the final request.

With TDF in the final query, the work schedule reaches approximately 97% of the progress in about 1.5 minutes. But then all hell is torn! The last node is a collection with 2500 vertices, in which the calculation time is 16 minutes. So my question is ... WHY?

Is this the case when I don’t understand some fundamental concepts about how Azure works?

So can anyone explain what is going on? Any help was appreciated.

Final request:

@Header =
SELECT [CTNNumber],
       [CTNCycleNo],
       [SeqStart],
       [SeqEnd],
       [StartUTC],
       [EndUTC],
       [StartLoc],
       [StartType],
       [EndLoc],
       [EndType],
       [Start Step],
       [Start Ctn Status],
       [Start Fill Status],
       [EndStep],
       [End Ctn Status],
       [End Fill Status]
FROM [Play].[getCycles3]
     ("") AS X;


@Detail =
SELECT [CTNNumber],
       [SeqNo] AS [SeqNo],
       [LocationType],
       [LocationID],
       [BizstepDescription],
       [ContainerStatus],
       [FillStatus],
       [UTCTimeStampforEvent]
FROM [Play].[getRaw]
     ("") AS Z;

@result =
    SELECT
        H.[CTNNumber], H.[CTNCycleNo], H.[SeqStart], H.[SeqEnd]
        ,COUNT([D].[SeqNo]) AS [SeqCount]
        //, COUNT(DISTINCT [LocationID]) AS [#Locations]
    FROM 
        @Header AS [H]
        INNER JOIN
        @Detail AS [D]
        ON 
        [H].[CTNNumber] == [D].[CTNNumber] 
    WHERE 
        [D].[SeqNo] >= [H].[SeqStart] AND
        [D].[SeqNo] <= [H].[SeqEnd]  
    GROUP BY 
        H.[CTNNumber], H.[CTNCycleNo], H.[SeqStart], H.[SeqEnd]
    ;

enter image description here

+4
source share
2 answers

So, I entered this as a ticket with Microsoft. Here is their answer, which I implemented and succeeded.

From: ########@microsoft.com Subject: RE: ########### USQL Job displays a fancy work plan and runtime Back

, . script , U-SQL , , .

, script , , , . , script, , a script, .

, , . CTNNumber , script.

, CREATE STATISTICS: https://docs.microsoft.com/en-us/sql/t-sql/statements/create-statistics-transact-sql

+1

. , .

, , , . , , .

OPTION(ROWCOUNT=x) , , ?

+1

Source: https://habr.com/ru/post/1686345/


All Articles