Files uploaded to S3 with S3BotoStorage end with metadata with invalid content metadata escaping

Question

Files uploaded to S3 with S3BotoStorage end with metadata with invalid content metadata escaping

FACEPALM UPDATE . It turns out I forgot / forgot that I used the older S3BotoStorage fork from https://github.com/gtaylor/django-athumb as my default repository (even if I have django repositories installed). The current version of django repositories does not suffer from this problem. The problem was that the content type headers were unicode when they clicked boto, and boto escaped unicode with urllib.quoteplus before sending it to AWS. This is actually not a Boto error, as headers must be converted to non-unicode strings for HTTP. For a deeper analysis, see https://github.com/boto/boto/issues/1669 .

Original question

I use django_storage S3BotoStorage in combination with FileField to upload files to Amazon S3. Here is my field:

 downloadable_file = FileField(max_length=255, upload_to="widgets/filedownloads", verbose_name="file")

In settings:

 DEFAULT_FILE_STORAGE = 'storages.backends.s3boto.S3BotoStorage'

Everything works as far as loading / loading.

However, the files are stored in my bucket with the wrong content type. When I look at the metadata for files in my AWS S3 console, the file’s Content-Type is displayed as “application% 2Fpdf” instead of “application / pdf”, which should be.

Escaped content type

If you say that it does not matter, it matters. The built-in Google Chrome PDF reader will hang on pdf with an invalid type of content, and the client brought this to my attention.

Here is an example file downloaded via django-storages / boto. If you use the built-in built-in PDF reader, I assume it freezes, as for me and for the client who reported this. If you use a chrome-free browser or an adobe plugin or download a file to disk, you'll probably be fine.

If I manually change the content type metadata using the AWS console to "application / pdf" (one of the standard options that it provides), then that would be fine.

I assume this is a bug with something internal, since boto creates an AWS policy document to upload the file, since I am not doing anything outside of the standard use here. However, I went through the boto code and cannot find where this is actually happening.

Can someone either offer a job, or lead me to a violating code in boto so that I can fix it and send a transfer request?

bot == 2.9.5 Django-vault == 1.1.8

+6

django amazon-s3 boto django-storage

Ben roberts Aug 15 '13 at 23:30

source share

2 answers

Not a direct answer to your question, but perhaps a useful workaround. I'm having trouble using django repositories with S3. I ended up trying cuddly-buddly and was very pleased with it. The author based it on the S3 module from django-storages and added a lot of corrections. I looked at friend commits and some changes affected the content type header, but I cannot test it with PDF loading without creating a new django project. However, I can verify that all of my files uploaded through Django do not have garbled slashes in the content type field in the S3 metadata.

If for some reason you cannot switch to a seductively test-friendly, let me know and I will try to set up a simple Django project to download some PDF files.

+3

Fiver Aug 26 '13 at 4:11

source share

Ben roberts · Accepted Answer · 2013-10-25T23:46:54+0000

The problem was that I was using a forked / legacy version of django storages that did not correctly convert content type headers to unicode strings before sending them to boto, which converts unicode strings to ascii strings (as required for HTTP headers) using the delete mechanism urllib quoteplus . The problem was resolved by switching to the current version of django repositories.

For a more detailed analysis of the problem, see: https://github.com/boto/boto/issues/1669#issuecomment-27132112

Files uploaded to S3 with S3BotoStorage end with metadata with invalid content metadata escaping

More articles: