Is the copy command in Amazon RedShift atomic or not?

For Amazon RedShift, data is usually loaded from S3 using the copy command. I want to know if the team is atomic or not. For instance. Is it possible that in some exceptional cases only part of the data file is loaded into the RedShift table?

+4
source share
2 answers

The COPY command with default parameters is atomic. If the file contains an invalid line that could cause the download to fail, the COPY transaction will be canceled and the data will not be imported.

If you want to skip invalid rows and not stop the transaction, you can use the MAXERROR option for the COPY command, which ignores invalid rows. Here is an example that ignores up to 100 invalid lines.

COPY table_name from 's3://[bucket-name]/[file-path or prefix]' CREDENTIALS 'aws_access_key_id=xxxx;aws_secret_access_key=xxxx' DELIMITER '\t' MAXERROR 100; 

If the number of invalid rows is greater than the number of MAXERROR errors (100), the transaction will be canceled.

For more information on the COPY command, see the following link. http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html

+6
source

You can use the NOLOAD flag to check for errors before loading data. This is a faster way to check the data format, since it is not trying to load any data, just analyze it.

You can determine how many errors you are willing to tolerate with the MAXERROR flag

If you have more than MAXERROR , your download will fail and no recording will be added.

More details here: http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html

+2
source

Source: https://habr.com/ru/post/1495883/


All Articles