I looked at this, and it seems to me that the answers are a little lacking. Here is what I can tell you about the pros and cons of each approach:
Writing a user program (via the Node BQ API or the user workflow) has several pitfalls when it comes to guarantees that are executed exactly once . In particular, if you write your own employee, you will need to do additional work to check the progress of the milestone and make sure that no elements have been discarded or duplicated in the event of runtime errors or the death of your workflow.
If your requirements change (for example, performing BQ streaming inserts becomes too expensive), the Dataflow Java SDK supports any of the options without any problems: streaming inserts or cheaper execution of several loading tasks in BQ instead of streaming inserts; and it also handles multiple data sources well.
The data stream provides automatic automatic scaling in the event of an increase in data volume.
With that in mind, I would say:
If your use case is relatively simple and everything is fine with very rare data points discarded when workers restart, then a user-written Node / Python application should help you.
If your use case only provides PubSub streaming to BQ, but you need to make sure the data is not deleted, check out the template provided by Andrew, which does just that.
If your use case is likely to be more complex than that, you can write your own pipeline (and use the template code as inspiration !).
Pablo source share