I use Luigi for my workflow. My workflow is divided into three general parts: import, analysis, export. Inside each part there are several Luigi tasks.
I could have everything in one file. But if I want to keep everything separate, as in the case data_import.py, analysis.pyand export.py.
For example, if data_import.pyit looks like this:
import luigi
class import_task_A(luigi.Task):
def requires(self):
return []
def output(self):
return luigi.LocalTarget('myfile.txt')
def run(self):
my import stuff
if __name__ == '__main__':
luigi.run()
But what if the export.py task depends on the task in import.py. I would do:
from data_import import import_task_A
import luigi
class export_task_A(luigi.Task):
def requires(self):
return import_task_A()
def output(self):
return luigi.LocalTarget('myfile.txt')
def run(self):
my import stuff
if __name__ == '__main__':
luigi.run()
If I have large projects divided into several files .py, what is the best way to tell Luigi what tasks are required in which file? This seems to be cumbersome.