Removing non-displayable characters from a file

$ cat weirdo 
Lunch now?
$ cat weirdo | grep Lunch
$ vi weirdo
  ^@L^@u^@n^@c^@h^@ ^@n^@o^@w^@?^@

I have some files containing text with some non-printable type characters ^@, due to which mine grepdoes not work (as mentioned above).

How can I get my job grep? Is there a way that does not require changing files?

+3
source share
4 answers

It looks like your file is encoded in UTF-16, and not in an 8-bit character set. "^ @" is the designation for ASCII NUL '\ 0', which usually spoils string matching.

UTF-16 UTF-8, grep - , "utf16-utf8", :

utf16-utf8 weirdo | grep Lunch

"utf16-utf8", :

tr -d '\0' < weirdo | grep Lunch

ASCII NUL grep "". ; , , .

+6

tr:

cat weirdo | tr -cd '[:print:]\r\n\t' | grep Lunch
+5

strings(1), :

strings file | grep Launch

. man strings.

+2

awk '{gsub(/[^[:print:]]/,"") }1' file 
+2

Source: https://habr.com/ru/post/1760964/


All Articles