Fileinfo and mime types I've never heard of

I am not an alien type of mime, but it is strange. Typically, a text file would be considered text / equal mime, but now, after implementing fileinfo, this file type is now considered "text / x-pascal". I'm a little worried because I need to be sure that I have installed the correct mime types before allowing users to download with it.

Is there a cheat sheet that will give me all the โ€œcommonโ€ memes since they are interpreted by fileinfo?


Sinan provided a link listing the increasingly common memes. If you look at this list, you will see that the .txt file has a text / plain mime, but in my case a text file with a simple jane is interpreted as text / pascal.

+4
source share
4 answers

fileinfo is the "best guess". It analyzes only part of the file to try to figure out what type of file, and as such, can be easily tricked. Perhaps your file begins with a Pascal comment or keyword such as Project or Unit .

+4
source

Fileinfo does not use the file extension to determine that type of mime type, but ( quoting ):

The functions in this module try to guess the type of content and file encoding in search of a specific magic byte sequence at certain positions in the file.

The idea is that the file name and its extension are provided by users (especially in cases such as yours, where files are downloaded by users) and, as such, are less โ€œof courseโ€ than the contents of the file itself.


Maybe the solution may be to not check the whole mime type returned by the info file, but to use only its first part - at least in some cases?

For example, perhaps you could accept all the mimetype that are in the text/* and image/* families and discard all that look like application/* , except for application/pdf ?
(Just an example - but you see the point)

+3
source

I found that at least in version 5.03, the file 'command may in some cases incorrectly identify a plain text file as the source Pascal file, simply because it contains the word "program" or "record". At least what it looks like by examining the source (src / names.h). I believe the php fileinfo command uses the same magic engine, so I suspect this is causing the problem. If / when I am accepted on the mailing list of files, I will notify those who deal with this issue.

[UPDATE] I asked this question but received no answer. Having studied this question in more detail, it turns out that the definition of text formats, in general, is very difficult . If you get the โ€œtext / *โ€ MIME type back from the file, you may need to ignore the result and assume that the resource is just โ€œtext / plainโ€, unless false negatives (text / html may not) cause difficulties .

+3
source

There is a diagram that shows a list of common MIME types and their corresponding extensions. Here

+2
source

Source: https://habr.com/ru/post/1302559/


All Articles