Boto get md5 s3 file

Question

Boto get md5 s3 file

I have a use case when I upload hundreds of files to my S3 bucket using multi-component uplaod. After each download, I need to make sure that the downloaded file is not corrupted (basically check the data integrity). Currently, after downloading the file, I re-upload it and compute md5 in the content line and compare it with the md5 of the local file. So something like

conn = S3Connection('access key', 'secretkey') bucket = conn.get_bucket('bucket_name') source_path = 'file_to_upload' source_size = os.stat(source_path).st_size mp = bucket.initiate_multipart_upload(os.path.basename(source_path)) chunk_size = 52428800 chunk_count = int(math.ceil(source_size / chunk_size)) for i in range(chunk_count + 1): offset = chunk_size * i bytes = min(chunk_size, source_size - offset) with FileChunkIO(source_path, 'r', offset=offset, bytes=bytes) as fp: mp.upload_part_from_file(fp, part_num=i + 1, md5=k.compute_md5(fp, bytes)) mp.complete_upload() obj_key = bucket.get_key('file_name') print(obj_key.md5) #prints None print(obj_key.base64md5) #prints None content = bucket.get_key('file_name').get_contents_as_string() # compute the md5 on content

This approach is wasteful as it doubles the use of bandwidth. I tried

 bucket.get_key('file_name').md5 bucket.get_key('file_name').base64md5

but both return None.

Is there any other way to achieve md5 without downloading all this?

+5

amazon-s3 md5 boto

kk1957 Oct 17 '14 at 0:10

source share

2 answers

Noamg · Answer 1 · 2014-11-18T08:25:18+0000

Yes
use bucket.get_key('file_name').etag[1 :-1]
thus, get the MD5 key without downloading it.

glefait · Answer 2 · 2015-12-29T09:57:08+0000

With boto3, I use head_object to retrieve an ETag.

 import boto3 import botocore def s3_md5sum(bucket_name, resource_name): try: md5sum = boto3.client('s3').head_object( Bucket=bucket_name, Key=resource_name )['ETag'][1:-1] except botocore.exceptions.ClientError: md5sum = None pass return md5sum

Boto get md5 s3 file

More articles: