As a rule, in such a situation, I would develop a package for listing through batches of size N (lines 1k, lines 10M, etc.) and a log for the processing table, what will be the last successful transfer of the package. However, with GUIDs, you cannot split them into buckets.
In this particular case, I would modify the data stream so that it looks like Source β Lookup β Destination. In the search transformation, query the Azure side and get only the keys (SELECT myGuid FROM myTable). Here we are only interested in strings that do not have a match in the set of search records, since those that are expected are transmitted.
A full cache will cost about 1.5 GB (100 MB * 16 bytes) of memory, assuming that the azure side is full, plus the associated data transfer costs. This cost will be less than truncating and retransmitting all the data, but I just want to make sure that I called it.
source share