If you want to generate surrogate key values ββin BigQuery, then it is better to avoid the ROW_NUMBER OVER () option and its variants. To quote BigQuery's surrogate key message:
To implement ROW_NUMBER (), BigQuery must sort the values ββin the root node of the execution tree, which is limited by the amount of memory in one execution node.
This always leads to problems with a small number of records.
There are two alternatives:
Option 1 - GENERATE_UUID ()
Since the surrogate key has no business value and is simply a unique key created for use in the data warehouse, you can simply generate it by calling the GENERATE_UUID() function in BigQuery. This gives you a universally unique UUID that you can use as a surrogate key value.
The downside is that this key will be 32 bytes instead of the 8-byte value of INT64. Therefore, if you have a huge number of records, this can increase the storage capacity of your data.
Option 2: create a unique hash
The second option is to use a hash function to generate a unique hash. This is a little trickier as you will need to find a combination of columns and / or a random other input to make sure you can never generate the same value twice.
Some hash functions also output a 32-byte value, so you will not save on storage, but the hash function FARM_FINGERPRINT () will output an INT64 value, which may save some storage. Thus, you can use options 1 and 2 to generate a unique integer surrogate key by doing the following: FARM_FINGERPRINT(GENERATE_UUID())
source share