Experimenting with AWS Athena. I am trying to create a table from an S3 bucket that has these file structures:
my-bucket/
my-bucket/group1/
my-bucket/group1/entry1/
my-bucket/group1/entry1/data.bin
my-bucket/group1/entry1/metadata
my-bucket/group1/entry2/
my-bucket/group1/entry2/data.bin
my-bucket/group1/entry2/metadata
...
my-bucket-group2/
...
Only files metadataare JSON files. Each one looks like this:
{
"key1": "value1",
"key2": "value2",
"key3": n
}
So, I tried to create a table:
CREATE EXTERNAL TABLE example (
key1 string,
key2 string,
key3 int
)
ROW FORMAT serde 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3://my-bucket/'
Successfully creating the request, but when I try to execute the request:
SELECT * FROM preserved_recordings limit 10;
I get an error message:
Query 93aa62d6-8a52-4a5d-a2fb-08a6e00181d3 failed with error code HIVE_CURSOR_ERROR: org.codehaus.jackson.JsonParseException: Unexpected end-of-input: expected close marker for OBJECT (from [Source: java.io.ByteArrayInputStream@2da7f4ef; line: 1, column: 0]) at [Source: java.io.ByteArrayInputStream@2da7f4ef; line: 1, column: 3]
Does AWS Athena require all files in the bucket to have JSON in this case? I'm not sure if the .bin files are causing a cursor error or something else is happening. Has anyone else come across this or can understand me what is going on?
source
share