I am desperately trying to replace some Unicode characters (graphemes) from a file with sed. However, I continue to fail for some of them, namely from unicode blocks:
\p{InHigh_Surrogates}: U+D800βU+DB7F \p{InHigh_Private_Use_Surrogates}: U+DB80βU+DBFF \p{InLow_Surrogates}: U+DC00βU+DFFF
I tried (in the sed configuration file loaded via the -f switch):
s/\p{InHigh_Surrogates}/_D-NON-UTF8_
Has anyone received an offer? In addition, I'm not necessarily focused on using blocks, but I also did not try to determine the range of characters of the form \ xd800- \ xdfff.
Thank you Thomas
source share