Organizing files using the Luigi pipeline?

I use Luigi for my workflow. My workflow is divided into three general parts: import, analysis, export. Inside each part there are several Luigi tasks.

I could have everything in one file. But if I want to keep everything separate, as in the case data_import.py, analysis.pyand export.py.

For example, if data_import.pyit looks like this:

import luigi

class import_task_A(luigi.Task):
    def requires(self):
        return []
    def output(self):
        return luigi.LocalTarget('myfile.txt')
    def run(self):
        my import stuff

if __name__ == '__main__':
    luigi.run()

But what if the export.py task depends on the task in import.py. I would do:

from data_import import import_task_A
import luigi

class export_task_A(luigi.Task):
    def requires(self):
        return import_task_A()
    def output(self):
        return luigi.LocalTarget('myfile.txt')
    def run(self):
        my import stuff

if __name__ == '__main__':
    luigi.run()

If I have large projects divided into several files .py, what is the best way to tell Luigi what tasks are required in which file? This seems to be cumbersome.

+5
2

? export_task_A , def :

def requires(self):
    return [import_task_A(), import_task_B()]

if __name__ == '__main__':
    luigi.run()

data_import.py. data_export.py

if __name__ == '__main__':
    luigi.build([export_task_A()])
0

, . , . , .

, , , , - Python, Luigi.

# my_tasks.py
from data_import import import_task_A
from export import export_task_A

, , my_tasks. getattr importlib .

0

Source: https://habr.com/ru/post/1658366/


All Articles