I am trying to read some of the logs from the Hadoop process that I am running in AWS. Logs are stored in the S3 folder and have the following path.
bucketname = name key = y / z / stderr.gz Here Y is the cluster identifier, and z is the folder name. Both of them act as folders (objects) in AWS. So the full path is like x / y / z / stderr.gz.
Now I want to unzip this.gz file and read the contents of the file. I do not want to upload this file to my system in order to save the contents in a python variable.
This is what I have tried so far.
bucket_name = "name" key = "y/z/stderr.gz" obj = s3.Object(bucket_name,key) n = obj.get()['Body'].read()
This gives me a format that is not readable. I also tried
n = obj.get()['Body'].read().decode('utf-8')
which gives the error, utf8 'codec cannot decode the 0x8b byte at position 1: invalid start byte.
I also tried
gzip = StringIO(obj) gzipfile = gzip.GzipFile(fileobj=gzip) content = gzipfile.read()
This returns an IOError error : not a gzipped file
Not sure how to decode this.gz file.
Change - find a solution. You must pass n to it and use BytesIO
gzip = BytesIO(n)
source share