One way would be to use Pub / Sub from the App Engine so that Cloud Cloudflow knows when new data is available. Then, the Cloud Dataflow job will run continuously and the App Engine will provide data for processing.
Another approach would be to add code that installs a data stream pipeline into a class in App Engine (including the Dataflow SDK for your GAE project) and programmatically sets job parameters, as described here:
https://cloud.google.com/dataflow/pipelines/specifying-exec-params
Be sure to set the "runner" parameter for the DataflowPipelineRunner, so it runs asynchronously on the Google cloud platform. Since the pipeline runner (which actually starts your pipeline) does not have to be the same as the code that initiates it, this code (up to pipe.run ()) can be in App Engine.
You can then add the endpoint or servlet to GAE, which when invoked, runs the code that sets up the pipeline.
To plan even more, you might have a cron job in GAE that invokes the endpoint that initiates the pipeline ...
source share