Fileinfo and mime types I've never heard of

Question

Fileinfo and mime types I've never heard of

I am not an alien type of mime, but it is strange. Typically, a text file would be considered text / equal mime, but now, after implementing fileinfo, this file type is now considered "text / x-pascal". I'm a little worried because I need to be sure that I have installed the correct mime types before allowing users to download with it.

Is there a cheat sheet that will give me all the “common” memes since they are interpreted by fileinfo?

Sinan provided a link listing the increasingly common memes. If you look at this list, you will see that the .txt file has a text / plain mime, but in my case a text file with a simple jane is interpreted as text / pascal.

+4

php fileinfo

Jim Feb 27 '10 at 1:35

source share

4 answers

Ignacio Vazquez-Abrams · Answer 1 · 2010-02-27T02:29:10+0000

fileinfo is the "best guess". It analyzes only part of the file to try to figure out what type of file, and as such, can be easily tricked. Perhaps your file begins with a Pascal comment or keyword such as Project or Unit .

Pascal martin · Answer 2 · 2010-02-27T09:16:31+0000

Fileinfo does not use the file extension to determine that type of mime type, but ( quoting ):

The functions in this module try to guess the type of content and file encoding in search of a specific magic byte sequence at certain positions in the file.

The idea is that the file name and its extension are provided by users (especially in cases such as yours, where files are downloaded by users) and, as such, are less “of course” than the contents of the file itself.

Maybe the solution may be to not check the whole mime type returned by the info file, but to use only its first part - at least in some cases?

For example, perhaps you could accept all the mimetype that are in the text/* and image/* families and discard all that look like application/* , except for application/pdf ?
(Just an example - but you see the point)

Andy jackson · Answer 3 · 2010-05-10T16:40:19+0000

I found that at least in version 5.03, the file 'command may in some cases incorrectly identify a plain text file as the source Pascal file, simply because it contains the word "program" or "record". At least what it looks like by examining the source (src / names.h). I believe the php fileinfo command uses the same magic engine, so I suspect this is causing the problem. If / when I am accepted on the mailing list of files, I will notify those who deal with this issue.

[UPDATE] I asked this question but received no answer. Having studied this question in more detail, it turns out that the definition of text formats, in general, is very difficult . If you get the “text / *” MIME type back from the file, you may need to ignore the result and assume that the resource is just “text / plain”, unless false negatives (text / html may not) cause difficulties .

Sinan · Answer 4 · 2010-02-27T01:40:23+0000

There is a diagram that shows a list of common MIME types and their corresponding extensions. Here

Fileinfo and mime types I've never heard of

More articles: