Redshift time series loading questions

The Redshift documentation identifies time series tables as best practice: http://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-time-series-tables.html

However, it does not address any of the following issues:

  • How many tables within the whole union - everything is reasonable - hundreds? ( no answer )
  • any way to write to union-all view and redshift directs these inserts to the correct base tables? (Answer: no)
  • most efficient way to load base tables? Perhaps using firehose to insert into the staging table, and then periodically inserting these rows into the corresponding table in union-all mode? ( no answer )
  • is there any way to enable redshift to eliminate some basic partitions (tables) when querying the union-all view if their date range does not match the query criteria? (Answer: No)
  • Can redshift support delete old tables, add new tables and rebuild a single view in a transaction? ( no answer )

My situation:

  • 100 million rows are added daily, which will grow to 500 million in 3 years
  • Desire for 12 months is desirable
  • It is expected that 99% of all requests will be affected in the last 1-7 days.
  • Data is written to an existing table through kinesis firehose in s3, which then launches a copy to the redshift table.

My suggested solution:

  • , dist_key sensor_id (100 000+ uniq) sort_key (timestamp, sensor_id).
  • firehose .
  • , , " " select *, timestamp = table timestamp.
  • , , , , firehose.
  • , .
  • , , , .
  • .

: , , .

+4
1

! , :

union-all ?

. , , , Redshift (, ).

redshift () union-all, ?

Redshift , . , , . , , .

WHERE, Redshift , . SORTKEY, .

, SORTKEY, Redshift , WHERE . , .

+2

Source: https://habr.com/ru/post/1656833/


All Articles