Amazon Athena S3 Glacier Analysis Services
We have petabytes of data in S3. We are https://www.pubnub.com/ and we store usage data in S3 of our network for billing purposes. We have tab delimited log files stored in the S3 bucket. Athena gives us HIVE_CURSOR_ERROR
.
Our S3 bucket is set to automatically push the AWS glacier after 6 months. Our bucket contains S3 files that are hot and ready to read in addition to the Glacier backup files. Because of this, we get access errors from Athena. The file referenced by the error is a Glacier backup.
I guess the answer is: do not store glacier backups in the same bucket. We do not have this option with ease due to our data volume sizes. I believe that Athena will not work in this setup, and we will not be able to use Athena for our log analysis.
However, if there is a way that we can use Athena, we would be delighted. Is there a solution for HIVE_CURSOR_ERROR
and a way to skip Glacier files? Our s3 bucket is a flat bucket without folders .

The file name of the S3 file shown in the screenshots above and below is not displayed in the screenshot. The file reference in HIVE_CURSOR_ERROR
is actually a Glacier object. You can see it in this screenshot of our S3 Bucket.

Note. I tried to post at https://forums.aws.amazon.com/ , but that was not bueno.

source share