Understanding the start and end times in the Azure factory pipeline

I set up a pipeline in Azure "Factory Data" to collect flat files from storage and upload them to tables in an Azure SQL database.

The template for this pipeline indicates that I need a start and end time that the tutorial says to set 1 day.

I'm trying to figure it out. If it was a CRON task on Linux or a scheduled task on Windows Server, I would just tell her when to start (i.e. Daily at 6am), and it would take a lot of time to complete.

This leads me to several related questions:

  • Why do I need to indicate the end time?
  • What if I donโ€™t know how long it will take to start?
  • If I install too far in the future, do I risk that the data pipeline will not be completed in a timely manner?
  • If I installed too soon, will there be a pipeline rupture?
  • Why is it hardcoded as a date instead of a frequency (i.e., says that using this format is "2014-10-14T16: 32: 41Z")

I found a previous question that sheds a little light on how to make the frequency, not hard-coded dates, but my questions above are still unanswered.

+5
source share
2 answers

A 1-day schedule is just an example to emphasize the concept that you expect 24 activity windows if the frequency is set to an hour for 1 day, as shown in the example.

Why do I need to indicate the end time?

You do not need to specify the end time if you want the conveyor to run endlessly. However, you may have business reasons for setting the end time, for example, to coincide with the annual business cycle. The total start and end time of a pipeline refers to the collection of activities in it. Actions will be performed in accordance with the frequency you set (hourly, daily, etc.) for the activity and availability of data sets. You can also set the start time for actions or shift or delay them (for example, if you want to process yesterday's data today) or set a start date in the past to fill in the data.

Why is it hard coded as a date instead of a frequency?

The reason for the start and end of the pipeline is the date, not the frequency, because this is the general date range for which your pipeline is active, and individual processing operations are related to the frequency and time of their frequent start.

What if I donโ€™t know how long it will take to start?

As soon as actions begin, they will be completed. If they exceed the end date, the pipeline simply will not start new activities.

If I install too far in the future, do I risk that the data pipeline will not be completed in a timely manner?

No, timely shutdown is only related to the size of your cluster, the amount of data, and the concurrency setting.

If I installed too soon, will there be a pipeline rupture?

See above

We provide the complexity of the schedule so that you can have much more flexibility in organizing multiple services, allowing ADF to manage cloud resources, and not just perform a cron job. Our documentation has more detailed planning information https://azure.microsoft.com/en-us/documentation/articles/data-factory-scheduling-and-execution/

+4
source

Why do I need to indicate the end time?

In ADF1, if you specify a start time, you need to specify an end time. if you do not specify a start and end time, thatโ€™s fine, you can deploy the pipeline, but the actions in Pipeline will not start.

0
source

Source: https://habr.com/ru/post/1244597/


All Articles