Update: UDF solution seems perfect. At the time I wrote this, there was no answer there. These are just some of the working methods.
Although json_extract_path_text cannot ignore errors, Redshift COPY has a MAXERROR parameter.
So you can use something like this:
COPY raw_json FROM 's3://data-source' CREDENTIALS 'aws_access_key_id;aws_secret_access_key' JSON 's3://json_path.json' MAXERROR 1000;
The following error is in the json_path.json file: you cannot use $ to indicate the root element:
{ "jsonpaths": [ "$['_id']", "$['type']", "$" <--------------- this will fail. ] }
So, it would be convenient to have a “top-level” element containing other fields, for example: (So, $['data'] is everything that is written in your record)
{ "data": { "id": 1 ... } } { "data": { "id": 2 ... } }
If you cannot change the original format, Redshift UNLOAD will help:
UNLOAD ('select_statement') TO 's3://object_path_prefix'
It's easy to use select_statement to concatenate: { "data" : + old line + } ...
Then Redshift jumps again!
source share