How to check if a file contains plain text?

I have a folder full of files, and I want to find a line in it. The problem is that some files may be zip, exe, ogg, etc. Can I somehow check which file this is, so I only open and view txt, PHP, etc. I can not rely on the file extension.

+4
source share
4 answers

You can use the Python interface for libmagic to define file formats.

>>> import magic >>> f = magic.Magic(mime=True) >>> f.from_file('testdata/test.txt') 'text/plain' 

See the repo section for more examples.

+3
source

Use Python mimetypes library:

 import mimetypes if mimetypes.guess_type('full path to document here')[0] == 'text/plain': # file is plaintext 
+5
source

try something like this:

 def is_binay_file(filepathname): textchars = bytearray([7,8,9,10,12,13,27]) + bytearray(range(0x20, 0x7f)) + bytearray(range(0x80, 0x100)) is_binary_string = lambda bytes: bool(bytes.translate(None, textchars)) if is_binary_string(open(filepathname, 'rb').read(1024)): return True else: return False 

use the method as follows:

 is_binay_file('<your file path name>') 

This will return True if the file is of binary type and False if it has text - it is easy to convert it to reflect your needs, fx. make the is_text_file function - I will leave it to you

+1
source

If you are using linux, you can parse the output of the file command line tool.

0
source

Source: https://habr.com/ru/post/1304487/


All Articles