Can sed or awk use the NUL character as a record separator?

Question

Can sed or awk use the NUL character as a record separator?

I have NUL splitting output coming from the following command:

some commands | grep -i -c -w -Z 'some regex'

The output consists of format entries:

 [file name]\0[pattern count]\0

I want to use word processing tools such as sed / awk to modify entries in the following format:

 [file name]:[pattern count]\0

But it seems that sed / awk usually only handles entries limited to the newline character. I would like to know how how sed / awk can be used to achieve my goal, or if sed / awk cannot handle such a case, which other Linux tool should I use.

Thanks for any suggestion.

Lawrence

+6

awk sed nul

user1129812 Feb 07 '12 at 2:12

source share

4 answers

Starting with version 4.2.2, GNU sed has the -z or --null-data option to do just that. For instance:

 sed -z 's/old/new' null_separated_infile

+5

Graeme Mar 22 '14 at 11:55

source share

Using `sed` to remove `null` characters -

 sed 's/\x0/ /g' infile > outfile

or substitute in the file (by backing up the source file and replacing the source file with substitutions).

 sed -i.bak 's/\x0/ /g' infile

Using `tr` :

 tr -d "\000" < infile > outfile

0

jaypal singh Feb 07 '12 at 2:50

source share

Yes, gawk can do this, set the record separator to \0 . For example, the command

 gawk 'BEGIN { RS="\0"; FS="=" } $1=="LD_PRELOAD" { print $2 }' </proc/$(pidof mysqld)/environ

Prints the value of the LD_PRELOAD variable:

 /usr/lib/x86_64-linux-gnu/libjemalloc.so.1

The /proc/$PID/environ file is a separate list of NUL environment variables. I use this as an example since it is easy to try on a Linux system.

The BEGIN part sets the record separator to \0 , and the field separator to = , because I also want to extract the part after = based on the part before = .

$1=="LD_PRELOAD" starts the block if there is a key in the first field that interests me.

The print $2 block prints the line after = .

But mawk cannot parse input files separated by NUL . This is documented in man mawk :

 BUGS mawk cannot handle ascii NUL \0 in the source or data files.

mawk stop reading input after the first character \0 .

You can also use xargs to handle NUL shared input, a little unintuitively, like here:

 xargs -0 -n1 </proc/$$/environ

xargs uses echo as the default command. -0 sets the input to split NUL . -n1 sets the maximum value of the echo arguments to 1, so the output will be separated by a newline.

And, as Graeme's answer shows, sed can do this too.

0

Paul tobias Jun 11 '19 at 8:14

source share

Tejas patil · Accepted Answer · 2012-02-07T02:23:14+0000

By default, a record separator is a newline, defining a record as a single line of text. You can use a different character by changing the built-in RS variable. The RS value is a string that says how to separate records; the default value is "\ n", a string containing only a newline character.

  awk 'BEGIN { RS = "/" } ; { print $0 }' BBS-list

Can sed or awk use the NUL character as a record separator?

Using sed to remove null characters -

Using tr :

More articles:

Using `sed` to remove `null` characters -

Using `tr` :