Possible duplicate:
How to determine if a file is binary or text in C #?
Without looking at the file name (extension), using only the content, we need to know whether the file is text or binary. I cannot use the extension because I do not know all the extensions of text files and because a text file can be without an extension.
I did this by looking for the percentage of non -ASCII bytes in the first part of the file. I cannot read the full file every time for performance reasons. I used the following code:
private static bool IsBinary(byte[] bytes, int maxLength)
{
int len = maxLength > 1024 ? 1024 : maxLength;
int nonASCIIcount = 0;
for( int i = 0; i < len; ++i )
if( bytes[i] > 127 )
++nonASCIIcount;
return (nonASCIIcount / len) > 0.3;
}
The problem is that some file types are mistakenly recognized as text because the first part of the file is text, such as photoshop files.
Any suggestion?
Borja