I have a problem using the match function in awk on a string containing special characters. Consider the test.awk file:
{ match($0,"(^.*)kon",a); print a[1]; }
and the corresponding test file "test.txt" with the contents of "Testing Håkon" (note the Norwegian symbol "å"). The file is encoded in "iso-8859-1" with a length of 14 bytes. The hexadecimal dump of the file is set by xxd -p test.txt as
54657374696e672048e56b6f6e0a
From this it can be seen that the Norwegian character “å” was encoded with the hexadecimal number “e5”. That is, the file is encoded using the encoding iso-8859-1 ..
Performance
awk -f test.awk test.txt
gives nothing on the terminal .. If the correct output should be "Testing Hå" ..
The output of the locale command:
LANG=en_DK.UTF-8 LANGUAGE=en_US: LC_CTYPE="en_DK.UTF-8" LC_NUMERIC="en_DK.UTF-8" LC_TIME="en_DK.UTF-8" LC_COLLATE="en_DK.UTF-8" LC_MONETARY="en_DK.UTF-8" LC_MESSAGES="en_DK.UTF-8" LC_PAPER="en_DK.UTF-8" LC_NAME="en_DK.UTF-8" LC_ADDRESS="en_DK.UTF-8" LC_TELEPHONE="en_DK.UTF-8" LC_MEASUREMENT="en_DK.UTF-8" LC_IDENTIFICATION="en_DK.UTF-8" LC_ALL=
which shows that the variable "LANG" is set to utf-8 encoding ..
source share