Print the Unicode hex code, which is part of the base latin as a character in Bash

I have a list containing a combination of Unicode characters and numbers (all four lengths), some of which are part of the basic Latin language. I want to print them all as characters.

An example of my failed attempt using Bash (under Cygwin):

list="0 3 4 5 005e 0060 00ff" $ for c in $list; do [[ ${#c} = 4 ]] && env printf "\\u$c\n" || echo $c; done 0 3 4 5 printf: invalid universal character name \u005e 005e ` ΓΏ 

I get the same problem regardless of language and encoding in terminal.

I cannot get an answer to this problem from askununtu to work: https://askubuntu.com/questions/20806/why-does-printf-report-an-error-on-all-but-three-ascii-range-unicode -codepoint

+4
source share
1 answer

This fix will allow you to use any characters in any encoding:

 list="0 3 4 5 005e 0060 00ff" for c in $list; do if [ ${#c} = 4 ]; then echo 0 "$c" | xxd -r | iconv -f UNICODEBIG -t UTF-8 echo else echo "$c" fi done 

xxd with the -r option converts hexadecimal text to bytes. It requires line numbers, which is the leading 0 in the echo. xxd in this case outputs two bytes, denoted by c.

The result xxd is piped to iconv. iconv converts one encoding to another. UNICODEBIG - double-byte Unicode characters, with the first byte most significant. UTF-8 is the encoding for converting to. (Replace the terminal encoding if you are not using UTF-8). This converts the character to the specified encoding.

This trick gives you complete freedom to encode any Unicode character from 0000 to ffff in any encoding that supports it.

EDIT: The easiest way is to use xxd. The new method is shown above, the old method is here:

 echo -ne \\x"${c:0:2}"\\x"${c:2:2}" | iconv -f UNICODEBIG -t UTF-8 
+3
source

Source: https://habr.com/ru/post/1437416/


All Articles