AIrflow - split DAG definition for multiple files

Just get started with Airflow and wonder what is best for structuring large DAGs. For our ETL, we have many tasks that fall into logical groupings, but the groups depend on each other. Which of the following would be considered best practice?

  • One large DAG file with all the tasks in this file
  • Splitting a DAG definition for multiple files (how to do this?)
  • Define multiple DAGs, one for each task group, and establish dependencies between them using ExternalTaskSensor

Also open to other offers.

+6
source share
1 answer

DAGs are just python files. This way you can split one dag definition into multiple files. Different files should only contain methods that take a dag object and create tasks using this dag object.

Note that globally, you only need one dag object. Airflow collects all dag objects on a global scale as separate failures.

It is often considered good practice to keep each dag as concise as possible. However, if you need to configure such dependencies, you can either consider using nationals. More on this here: https://airflow.incubator.apache.org/concepts.html?highlight=subdag#scope

You can also use ExternalTaskSensor, but be careful, because as the number of dags increases, it can become difficult to handle external dependencies between tasks. I think subdags may be a way for your use.

+5
source

Source: https://habr.com/ru/post/1013806/


All Articles