Amazon Athena and S3 Compressed Files

I have an S3 bucket with several zip CSV files (usage logs.) I would like to request this data using Athena, but the result is completely distorted.

Athena seems to be trying to parse zip files without unpacking them first. Is it possible to get Hive to recognize my files as compressed data?

+6
source share
1 answer

For Athena compression, supported but supported formats

  • Snappy (.snappy)
  • Zlib (.bz2)
  • Gzip (.gz)

These formats are determined by the suffix of the file name. If the suffix does not match, the reader does not decrypt the content. I tested it with the test.csv.gz file and it worked right away. So try changing the compression from zip to gzip and it should work.

+12
source

Source: https://habr.com/ru/post/1013307/


All Articles