\0' An example with some context:...">

In a bash script, what would evaluate $ '\ 0' and why?

In various bash scripts, I came across the following: $'\0'

An example with some context:

 while read -r -d $'\0' line; do echo "${line}" done <<< "${some_variable}" 

What returns $ '\ 0' as its value? Or, slightly different, what evaluates $ '\ 0' and why?

Perhaps it was given elsewhere. I did a search before publishing, but a limited number of characters or significant words in a dollar-quote-slash-quote quote is very difficult to get stackoverflow or google search results. So, if there are other recurring questions, please allow some grace and associate them with this question.

+5
source share
4 answers

To complement rici's helpful answer :

Please note that this answer is about bash . ksh and zsh also support $'...' strings, but their behavior is different:
* zsh creates and stores NUL (zero bytes) using $'\0' .
* ksh , has the same limitations as bash , and additionally interprets the first NUL in the output of command substitution as a line terminator (cuts off at the first NUL, while bash removes such NULs).

$'\0' is an ANSI slash string that technically creates a NUL ( 0x0 bytes) but effectively results in an empty (null) string (same as '' ) , because any NUL is interpreted as (C-style ) the terminator of the Bash string in the context of the arguments and here-docs / here-strings.

As such, the use of $'\0' somewhat misleading , as it assumes that you can create a NUL in this way when you really cannot:

  • cannot create NUL as part of the argument command or here-doc / here-string , and you cannot store NUL in a variable :
    • echo $'a\0b' | cat -v # -> 'a' echo $'a\0b' | cat -v # -> 'a' - the line is completed after 'a'
    • cat -v <<<$'a\0b' # -> 'a' - ditto
  • In the context of command substitutions , in contrast, NULs lack :

    • echo "$(printf 'a\0b')" | cat -v # -> 'ab' echo "$(printf 'a\0b')" | cat -v # -> 'ab' - NUL loses
  • However, you can transfer NUL bytes through files and channels .

    • printf 'a\0b' | cat -v # -> 'a^@b' printf 'a\0b' | cat -v # -> 'a^@b' - NUL is saved via stdout and pipe
    • Note that this is printf , which generates NUL through its one-argument argument, the printf escape sequences are then interpreted and written to stdout. In contrast, if you used printf $'a\0b' , bash again interprets NUL as the line terminator in the front and passes only 'a' to printf .

If we look at an example code whose intention is to read all the input right away, line by line (so I changed line to content ):

 while read -r -d $'\0' content; do # same as: `while read -r -d '' ...` echo "${content}" done <<< "${some_variable}" 

This will never go into the body of the while , since the input stdin is provided here by a string, which, as explained, cannot contain NUL.
Note that read really looking for NUL with -d $'\0' , although $'\0' effective. '' In other words: read by convention, interprets an empty (null) string to designate NUL as the -d option-argument parameter, because NUL itself cannot be specified for technical reasons.

In the absence of an actual NUL at the input, the read exit code indicates a failure, so the loop is never entered.

However, even in the absence of a separator, the value is read , therefore, for this code to work with this line or here-doc, it must be changed as follows:

 while read -r -d $'\0' content || [[ -n $content ]]; do echo "${content}" done <<< "${some_variable}" 

However, as @rici notes in the comment, with one (multi-line) input line, you do not need to use while at all :

 read -r -d $'\0' content <<< "${some_variable}" 

This reads all the content of $some_variable when trimming leading and $some_variable spaces (this is what read does with $IFS by default) ( $' \t\n' ). @rici also points out that if such trimming is undesirable, just content=$some_variable will do.

Contrast this with input that actually contains NUL , in which case while needed to process each token separated by NUL (but without the || [[ -n $<var> ]] clause); find -print0 prints the file names separated by NUL each):

 while IFS= read -r -d $'\0' file; do echo "${file}" done < <(find . -print0) 

Note the use of IFS= read ... to suppress trimming leading and trailing spaces, which is undesirable in this case, because the names of the input files must be saved as they are.

+3
source

In bash, $'\0' exactly matches '' : an empty string. In this case, it makes no sense to use special Bash syntax.

Lines

Bash always have a NUL end, so if you manage to insert a NUL in the middle of the line, it will end the line. In this case, C-escape \0 converted to a NUL character, which then acts as a line terminator.

The -d parameter of the built-in read (which defines the end of line character for input) expects a single character in its argument. It does not check if this character is a NUL character, so it will be equally happy with the NUL '' terminator or the explicit NUL in $'\0' (which is also a NUL terminator, so this is probably not the case). The effect in any case will be to read data with zero completion, as it was done (for example) using the find -print0 .

In the specific case of read -d '' line <<< "$var' it is impossible for $var have an internal NUL character (for the reasons described above), so line will be set to the entire value of $var with leading characters and deleted spaces will be removed. ( As @mklement notes, this will not be obvious in the proposed code snippet, because read will have a non-zero exit status even if the variable is set; read returns the result only if the delimiter is actually found, and NUL cannot be part of the line here .)

Please note that there is a big difference between

 read -d '' line 

and

 read -d'' line 

The first one is correct. In the second case, the argument word passed to read is -d , which means that this parameter will be the next argument (in this case, line ). read -d$'\0' line will have the same behavior; anyway space is needed. (So, again, there is no need for C-escape syntax).

+9
source

Technically, the extension $'\0' will always be an empty string '' (aka empty string) for the shell (not in zsh). Or, inversely worded, $'\0' will never expand to ascii NUL (or a byte with a null value), (again, not in zsh). It should be noted that it is confusing that both names are very similar: NUL and null .

However, when we talk about read -d '' , there is an arrogant (rather confusing) twist.

What read see - this is the value '' (empty string) as a separator.

What read does is splitting the input from stdin into the character $'\0' (yes the actual 0x00 ).


Extended answer.

Question in the caption:

In a bash script, what would evaluate $ '\ 0' and why?

This means that we need to explain why $'\0' extends.

The fact that $'\0' expands is very simple: it expands to the zero line '' (in most shells, not in zsh).

But an example of use:

 read -r -d $'\0' 

This converts the question to: what separator character $ '\ 0' expands to?

This has a very confusing twist. To correctly answer this question, we need to carry out a complete circuit when and how NUL (byte with a zero value or "0x00") is used in shells.

Flow.

We need a NUL to work. It is possible to create NUL bytes from the shell:

 $ echo -e 'ab\0cd' | od -An -vtx1 61 62 00 63 64 0a ### That works in bash. $ printf 'ab\0cd' | od -An -vtx1 61 62 00 63 64 ### That works in all shells tested. 

Variable

A variable in the shell will not store NUL.

 $ printf -va 'ab\0cd'; printf '%s' "$a" | od -An -vtx1 61 62 

The example is for bash, since only bash printf has the -v . But in this example, it is clear that a string containing NUL will be cut to NUL. Simple variables will cut the string in zero byte. What is reasonable to expect if the string is a C string, which should end in NUL \0 . Once the NUL is found, the line should end.

Command substitution.

NUL will work differently when used in command substitution. This code should assign the value to the variable $a , and then print it:

 $ a=$(printf 'ab\0cd'); printf '%s' "$a" | od -An -vtx1 

And this happens, but with different results in different shells:

 ### several shells just ignore (remove) ### a NUL in the value of the expanded command. /bin/dash : 61 62 63 64 /bin/sh : 61 62 63 64 /bin/b43sh : 61 62 63 64 /bin/bash : 61 62 63 64 /bin/lksh : 61 62 63 64 /bin/mksh : 61 62 63 64 ### ksh trims the the value. /bin/ksh : 61 62 /bin/ksh93 : 61 62 ### zsh sets the var to actually contain the NUL value. /bin/zsh : 61 62 00 63 64 /bin/zsh4 : 61 62 00 63 64 

Of particular note, bash (version 4.4) warns of the fact:

 /bin/b44sh : warning: command substitution: ignored null byte in input 61 62 63 64 

In command substitution, a zero byte is silently ignored by the shell.
It is very important to understand that this does not happen in zsh.

Now that we have all the parts about NUL. We can see what is reading.

What to read on the NUL delimiter.

This brings us back to the read -d $'\0' command:

 while read -r -d $'\0' line; do 

$'\0' shoud was expanded to a byte of value 0x00 , but the shell cuts it and actually becomes. '' This means that both $'\0' and '' are accepted by reading as the same value.

Having said that, it might seem reasonable to write an equivalent construct:

 while read -r -d '' line; do 

And this is technically correct.

What limits '' in fact.

There are two sides to this point, one of which is the character after the -d option to read, and the other that is addressed here is: what character will read if you set the delimiter to -d $'\0' ?

The first side was discussed in detail above.

The second side is very confusing, because the read command will actually read until the next byte of the value 0x00 (which means $'\0' ).

To actually show that this is so:

 #!/bin/bash # create a test file with some zero bytes. printf 'ab\0cd\0ef\ngh\n' > tfile while true ; do read -r -d '' line; a=$? echo "exit $a" if [[ $a == 1 ]]; then printf 'last %s\n' "$line" break else printf 'normal %s\n' "$line" fi done <tfile 

upon execution, the output will be:

 $ ./script.sh exit 0 normal ab exit 0 normal cd exit 1 last ef gh 

The first two exit 0 successfully read until the next "zero byte", and both contain the correct values ab and cd . The next read is the last (since there are no more null bytes) and contains the value $ 'ef \ ngh' (yes, it also contains a new line).

All this shows (and proves) that read -d '' does read until the next β€œnull byte”, which is also known by the name ascii NUL and should have been the result of the extension $'\0' .

In short: we can safely say that read -d '' is read until the next 0x00 (NUL).

Output:

We must indicate that a read -d $'\0' will expand to the 0x00 delimiter. Using $'\0' is the best way to convey this correct meaning to the reader. As a thing of code style: I write $ '\ 0' so that my intentions are clear.

One and only one character used as a delimiter: byte value 0x00 (even if it is truncated in bash)


Note : Either these commands will print the hexadecimal values ​​of the stream.

 $ printf 'ab\0cd' | od -An -vtx1 $ printf 'ab\0cd' | xxd -p $ printf 'ab\0cd' | hexdump -v -e '/1 "%02X "' 61 62 00 63 64 
+3
source

$'\0' extends the contained escape sequence \0 to the actual characters that they represent, which are \0 or the empty character in the shell.

This is the syntax of BASH. By man BASH :

Words of the form $'string' considered specially. The word expands to a line with the replacement of backslash characters as specified in the ANSI C standard. Known backslash escape sequences are also decoded.

Similarly, $'\n' expands to a new line, and $'\r' expands to a carriage return.

+1
source

Source: https://habr.com/ru/post/1246990/


All Articles