How to start Google Cloud Dataflow from App Engine?

After reading Cloud Dataflow docs, I'm still not sure how I can launch my application to stream data from App Engine. Is it possible? Does my backend in Python or Java matter? Thanks!

+6
source share
3 answers

Yes, it is possible, you need to use "Streaming Execution" as indicated here .

Using Google Cloud Pub / Sub as a streaming source, you can use it as a trigger for your pipeline.

From the App Engine, you can perform the “Pub” action in a Pub / Sub Hub with a REST API .

+3
source

You might be able to submit your Dataflow job from App Engine, but this is not something that is actively supported, as suggested by the lack of documents. The APP Engine runtime makes it difficult to complete some necessary operations, for example. to receive credentials, to send data flow jobs.

0
source

One way would be to use Pub / Sub from the App Engine so that Cloud Cloudflow knows when new data is available. Then, the Cloud Dataflow job will run continuously and the App Engine will provide data for processing.

Another approach would be to add code that installs a data stream pipeline into a class in App Engine (including the Dataflow SDK for your GAE project) and programmatically sets job parameters, as described here:

https://cloud.google.com/dataflow/pipelines/specifying-exec-params

Be sure to set the "runner" parameter for the DataflowPipelineRunner, so it runs asynchronously on the Google cloud platform. Since the pipeline runner (which actually starts your pipeline) does not have to be the same as the code that initiates it, this code (up to pipe.run ()) can be in App Engine.

You can then add the endpoint or servlet to GAE, which when invoked, runs the code that sets up the pipeline.

To plan even more, you might have a cron job in GAE that invokes the endpoint that initiates the pipeline ...

0
source

Source: https://habr.com/ru/post/985105/


All Articles