How can I find out if a txt file is binary or using its contents?

Possible duplicate:
How to determine if a file is binary or text in C #?

Without looking at the file name (extension), using only the content, we need to know whether the file is text or binary. I cannot use the extension because I do not know all the extensions of text files and because a text file can be without an extension.

I did this by looking for the percentage of non -ASCII bytes in the first part of the file. I cannot read the full file every time for performance reasons. I used the following code:

private static bool IsBinary(byte[] bytes, int maxLength)
{
    int len = maxLength > 1024 ? 1024 : maxLength;

    int nonASCIIcount = 0;

    for( int i = 0; i < len; ++i )
        if( bytes[i] > 127 )
            ++nonASCIIcount;

    // if the number of non ASCII is more than a 30%
    // then is a binary file. 
    return (nonASCIIcount / len) > 0.3;
}

The problem is that some file types are mistakenly recognized as text because the first part of the file is text, such as photoshop files.

Any suggestion?

+3
3

, , . , ANYTHING ASCII, , . :

bool IsBinary()
{
  for (int i = 0; i < bytes.Length; i++ )
    if (bytes[i] > 127)
      return true;
  return false;
}

: , , MIME-, .

+2

, . > 127, - 32 (0x20) 0x0a 0x0d ( ) ASCII. UTF8 , UTF8, , .

0

" " . , ascii 32.

-1

Source: https://habr.com/ru/post/1786915/


All Articles