What is the best way to insert a Unicode character into a POSIX script shell?

There are several shell related methods for including "unicode literal" in the string. For example, in the Bash mechanism citation lines $'', it allows us to directly introduce an invisible symbol: $'\u2620'.

However, if you are trying to write universal cross-platform shell scripts (as a rule, this can be truncated to β€œworks in Bash, Zsh and Dash.”), Which is not a portable function.

I can wrap everything in an ASCII table (octal number space) with a design similar to the following:

WHAT_A_CHARACTER="$(printf '\036')"

... however, POSIX / Dash printfonly supports octal escape sequences.

I can also obviously achieve full Unicode space by putting the task in a more complete programming environment:

OH_CAPTAIN_MY_CAPTAIN="$(ruby -e 'print "\u2388"')"
TAKE_ME_OUT_TONIGHT="$(node -e 'console.log("\u266C")')"

So: what is the best way to encode such a character in a shell script that:

  • Works in dash, bashand zsh,
  • shows the hexadecimal encoding of the code in the code,
  • independent of the specific encoding of the string (i.e. not by encoding UTF-8 bytes in octal format)
  • and, finally, it does not require invoking any "heavy" interpreter. (say, the runtime is less than 0.01 s.)
+4
source share
2 answers

Gnu printf (, debian package coreutils), , , :

env printf '\u2388\n'

Posix-standard env, printf, , printf, , ,

/usr/bin/printf '\u2388\n'

printf shell printf Posix, . iconv UTF-8, Posix , iconv, , . , Posix , , , "" script :

printf $(printf '\\%o' $(printf %08x 0x2388 | sed 's/../0x& /g')) |
iconv -f UTF-32BE -t UTF-8

printf, 8 , sed, 4 , printf , , printf , iconv big-endian UTF-32. ( printf, escape- \x, Posix , dash .)

, Unicode ( ) ( dash):

$ printf $(printf '\\%o' $(printf %08x 0x2388 0x266c 0xA |
>                          sed 's/../0x& /g')) |
> iconv -f UTF-32BE -t UTF-8
βŽˆβ™¬
$

. , ( Posix, , , ) %08x printf, , , %, . , , .

+6

echo -e "\xc3\xb6"

:

~ $ echo -e "\xc3\xb6"
ΓΆ
~ $ echo -n ΓΆ | hexdump
0000000 b6c3                                   
0000002
-3

Source: https://habr.com/ru/post/1569014/


All Articles