Airflow: dag_id not found

I am running an airflow server and working on different AWS machines. I synchronized the dags folder between them, ran airflow initdb on both, and checked that the dag_id are the same when running airflow list_tasks <dag_id>

When I run the scheduler and the worker, I get this error from the worker:

airflow.exceptions.AirflowException: dag_id not found :. Either the dag did not exist, or did not understand. [...] Command ... - local -sd / home / ubuntu / airflow / dags / airflow_tutorial.py '

It seems that the problem is that the path is wrong (/home/ubuntu/airflow/dags/airflow_tutorial.py), since the correct path is / home / hasoop / ...

On the server machine, the path goes with ubuntu, but in both configuration files it is just ~/airflow/...

What makes an employee look for this path, and not the right one?

How can I say to look at my own home directory?

change

  • This is hardly a configuration issue. I ran grep -R ubuntu and only log entries
  • When I run the same on a computer with ubuntu as a user, everything works. This leads me to believe that for some reason, the airflow provides the employee with the full path to the task.
+5
source share
3 answers

Adding the --raw to the airflow run command helped me find out what the original exception was. In my case, the metadata database instance was too slow, and loading the dags failed due to a timeout. I fixed it:

  • Database instance upgrade
  • Increasing dagbag_import_timeout parameter in airflow.cfg

Hope this helps!

+7
source

Have you tried setting the dags_folder parameter in the configuration file to explicitly point to / home / hadoop / ie the desired path?

This parameter controls the location to search for dags.

+1
source

I am experiencing the same thing: the workflow seems to be passing the --sd argument --sd folder on the scheduler machine, and not on the work computer (even if the dags_folder set correctly in the dags_folder configuration file in the workplace). In my case, I managed to get everything to work by creating a symbolic link on the scheduler host so that dags_folder could be set to the same value. (In your example, this would mean creating symlink / home / hadoop → / home / ubuntu on the scheduler machine, and then setting dags_folder in / home / hadoop). So this is not really the answer to the problem, but in some cases it is a viable solution.

+1
source

Source: https://habr.com/ru/post/1266371/


All Articles