Can sed or awk use the NUL character as a record separator?

I have NUL splitting output coming from the following command:

some commands | grep -i -c -w -Z 'some regex' 

The output consists of format entries:

 [file name]\0[pattern count]\0 

I want to use word processing tools such as sed / awk to modify entries in the following format:

 [file name]:[pattern count]\0 

But it seems that sed / awk usually only handles entries limited to the newline character. I would like to know how how sed / awk can be used to achieve my goal, or if sed / awk cannot handle such a case, which other Linux tool should I use.

Thanks for any suggestion.

Lawrence

+6
source share
4 answers

By default, a record separator is a newline, defining a record as a single line of text. You can use a different character by changing the built-in RS variable. The RS value is a string that says how to separate records; the default value is "\ n", a string containing only a newline character.

  awk 'BEGIN { RS = "/" } ; { print $0 }' BBS-list 
+2
source

Starting with version 4.2.2, GNU sed has the -z or --null-data option to do just that. For instance:

 sed -z 's/old/new' null_separated_infile 
+5
source

Using sed to remove null characters -

 sed 's/\x0/ /g' infile > outfile 

or substitute in the file (by backing up the source file and replacing the source file with substitutions).

 sed -i.bak 's/\x0/ /g' infile 

Using tr :

 tr -d "\000" < infile > outfile 
0
source

Yes, gawk can do this, set the record separator to \0 . For example, the command

 gawk 'BEGIN { RS="\0"; FS="=" } $1=="LD_PRELOAD" { print $2 }' </proc/$(pidof mysqld)/environ 

Prints the value of the LD_PRELOAD variable:

 /usr/lib/x86_64-linux-gnu/libjemalloc.so.1 

The /proc/$PID/environ file is a separate list of NUL environment variables. I use this as an example since it is easy to try on a Linux system.

The BEGIN part sets the record separator to \0 , and the field separator to = , because I also want to extract the part after = based on the part before = .

$1=="LD_PRELOAD" starts the block if there is a key in the first field that interests me.

The print $2 block prints the line after = .


But mawk cannot parse input files separated by NUL . This is documented in man mawk :

 BUGS mawk cannot handle ascii NUL \0 in the source or data files. 

mawk stop reading input after the first character \0 .


You can also use xargs to handle NUL shared input, a little unintuitively, like here:

 xargs -0 -n1 </proc/$$/environ 

xargs uses echo as the default command. -0 sets the input to split NUL . -n1 sets the maximum value of the echo arguments to 1, so the output will be separated by a newline.


And, as Graeme's answer shows, sed can do this too.

0
source

Source: https://habr.com/ru/post/907811/


All Articles