AWS Athena on S3 Bucket with Some JSON Files

Experimenting with AWS Athena. I am trying to create a table from an S3 bucket that has these file structures:

my-bucket/
my-bucket/group1/
my-bucket/group1/entry1/
my-bucket/group1/entry1/data.bin
my-bucket/group1/entry1/metadata
my-bucket/group1/entry2/
my-bucket/group1/entry2/data.bin
my-bucket/group1/entry2/metadata
...
my-bucket-group2/
...

Only files metadataare JSON files. Each one looks like this:

{
    "key1": "value1",
    "key2": "value2",
    "key3": n
}

So, I tried to create a table:

CREATE EXTERNAL TABLE example (
  key1 string,
  key2 string,
  key3 int
)
ROW FORMAT  serde 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3://my-bucket/'

Successfully creating the request, but when I try to execute the request:

SELECT * FROM preserved_recordings limit 10;

I get an error message:

Query 93aa62d6-8a52-4a5d-a2fb-08a6e00181d3 failed with error code HIVE_CURSOR_ERROR: org.codehaus.jackson.JsonParseException: Unexpected end-of-input: expected close marker for OBJECT (from [Source: java.io.ByteArrayInputStream@2da7f4ef; line: 1, column: 0]) at [Source: java.io.ByteArrayInputStream@2da7f4ef; line: 1, column: 3]

Does AWS Athena require all files in the bucket to have JSON in this case? I'm not sure if the .bin files are causing a cursor error or something else is happening. Has anyone else come across this or can understand me what is going on?

+4
source share
1 answer

, Athena (Presto, Hive) , , LOCATION, . , .

+2

Source: https://habr.com/ru/post/1662697/


All Articles