How to use magic to check file type in pure form of Django?

I wrote an email form class in Django with FileField. I want to check the downloaded file for its type by checking its mimetype type. Subsequently, I want to restrict file types to pdf, word documents and open office documents.

For this purpose, I installed python-magic and would like to check the file types as below for specifications for python-magic:

mime = magic.Magic(mime=True) file_mime_type = mime.from_file('address/of/file.txt') 

However, recently uploaded files do not have addresses on my server. I also do not know any methods of the mime object, akin to "from_file_content", which checks the type of mime, given the contents of the file.

What is an effective way to use magic to check file types of uploaded files in Django formats?

+5
source share
5 answers

Stan described a good option with a buffer. Unfortunately, the disadvantage of this method is reading the file into memory. Another option is to use a temporarily saved file:

 import tempfile import magic with tempfile.NamedTemporaryFile() as tmp: for chunk in form.cleaned_data['file'].chunks(): tmp.write(chunk) print(magic.from_file(tmp.name, mime=True)) 

Alternatively, you can check the file size:

 if form.cleaned_data['file'].size < ...: print(magic.from_buffer(form.cleaned_data['file'].read())) else: # store to disk (the code above) 

Optional :

The ability to use a name to open a file the second time the named temporary file is still open depends on the platform (it can be used on Unix; this is not possible on Windows NT or later).

So you might want to deal with this like this:

 import os tmp = tempfile.NamedTemporaryFile(delete=False) try: for chunk in form.cleaned_data['file'].chunks(): tmp.write(chunk) print(magic.from_file(tmp.name, mime=True)) finally: os.unlink(tmp.name) tmp.close() 

Alternatively, you might want to seek(0) after read() :

 if hasattr(f, 'seek') and callable(f.seek): f.seek(0) 

Where are the downloaded data stored?

+4
source

Why not try something similar in your opinion:

 m = magic.Magic() m.from_buffer(request.FILES['my_file_field'].read()) 

Or use request.FILES instead of form.cleaned_data if django.forms.Form really not an option.

+4
source
 mime = magic.Magic(mime=True) attachment = form.cleaned_data['attachment'] if hasattr(attachment, 'temporary_file_path'): # file is temporary on the disk, so we can get full path of it. mime_type = mime.from_file(attachment.temporary_file_path()) else: # file is on the memory mime_type = mime.from_buffer(attachment.read()) 

Alternatively, you might want to seek(0) after read() :

 if hasattr(f, 'seek') and callable(f.seek): f.seek(0) 

An example from the Django code . Executed for image fields during validation.

+3
source

You can use django-safe-filefield to verify that the downloaded file extension matches the MIME type.

 from safe_filefield.forms import SafeFileField class MyForm(forms.Form): attachment = SafeFileField( allowed_extensions=('xls', 'xlsx', 'csv') ) 
0
source

If you handle the file upload and only care about the images, Django will set the content_type for you (more precisely for yourself?):

 from django.forms import ModelForm from django.core.files import File from django.db import models class MyPhoto(models.Model): photo = models.ImageField(upload_to=photo_upload_to, max_length=1000) class MyForm(ModelForm): class Meta: model = MyPhoto fields = ['photo'] photo = MyPhoto.objects.first() photo = File(open('1.jpeg', 'rb')) form = MyForm(files={'photo': photo}) if form.is_valid(): print(form.instance.photo.file.content_type) 

It does not depend on the type of content provided by the user. But django.db.models.fields.files.FieldFile.file is an undocumented property .

In fact, initially the content_type set from the request , but when the form is validated, the value is updated .

As for non-images, executing request.FILES['name'].read() seems quite acceptable to me. The first is what Django does. Secondly, files larger than 2.5 MB are stored on disk by default. So let me give you another answer here.


For the curious, here is a stack trace that updates the content_type :

django.forms.forms.BaseForm.is_valid: self.errors
django.forms.forms.BaseForm.errors: self.full_clean ()
django.forms.forms.BaseForm.full_clean: self._clean_fields ()
django.forms.forms.BaseForm._clean_fiels: field.clean ()
django.forms.fields.FileField.clean: super (). clean ()
django.forms.fields.Field.clean: self.to_python ()
django.forms.fields.ImageField. to_python

0
source

Source: https://habr.com/ru/post/1388175/


All Articles