MacOS wc (wordcount) counts incorrect words with the UTF-8 character Å

When used wcin a string Ås(capital city of the letters Å), I get wordcount 2 when I expect the phrase 1.

Counting words Å, it gives one that feels right.

$ echo sÅ | wc
       1       1       4
$ echo Å | wc
       1       1       3

Counting words Ås, sÅsit gives 2, which does not seem right.

$ echo sÅs | wc
       1       2       5
$ echo Ås | wc
       1       2       4

Only a letter Åcan reproduce this, and not any of åäöÄÖ.

$ echo "Ås" | wc
       1       2       4
$ echo "Äs" | wc
       1       1       4
$ echo "Ös" | wc
       1       1       4

I use the default Locale settings from Mac OS when starting the terminal, it looks like this:

$ locale
LANG=
LC_COLLATE="C"
LC_CTYPE="UTF-8"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=

I get the same results on MacOS Sierra and Lion.

Just check what the string looks like Ås.

$ echo "Ås" | hexdump
0000000 c3 85 73 0a                                    
0000004

, , Mac OS - , wc?

​​ Mac OS wc UTF-8 Å?

, ( ) wc -c, 85 , ASCII? (wc -m wordcount)

- ?

+4

Source: https://habr.com/ru/post/1668392/


All Articles