I decided to post the question, spending a lot of time and still not figuring out the problem. Also read a bunch of seemingly related messages, none of them correspond to my simple (?) Problem.
So, I have, perhaps, a large text file (> 1000 lines) containing Chinese Mandarin characters with an example line like:
"ref#2-5-1.jpg#2#一些 <variable> 内容#pic##" (the Chinese just means "some content").
All that needs to be changed is that between each character a space must be inserted, if it is not already:
"ref#2-5-1.jpg#2#一 些 <variable> 内 容#pic##".
I started naively with simple things like the following, but there is no match:
sed -e 's/\([\u4E00-\u9fff]\)/\1 /g' <test_utf_sed.txt > test_out.txt
where 4E00-9fff should be the code range for Chinese Mandarin. No wonder this didn't work, so I also wanted to try
sed -e 's/\([一-龻]\)/hello/g' <test_utf_sed.txt > test_out.txt
, bash (?) "一".
, :
sed -e 's/\(\u4E00\)/hello/g' <test_utf_sed.txt > test_out.txt
sed -e 's/\(\u4E9B\)/hello/g' <test_utf_sed.txt > test_out.txt
utf ( stackoverflow):
sed -e 's/\(\u'U+4E00\)/hello/g' <test_utf_sed.txt > test_out.txt
1) , ?
2) sed unicode , ?
3) :
step1: insert space after each character
//like 's/\(.\)/\1 /g')
step2: remove space after each chacter which is not a Chinese character
//like 's/\([a-zA-Z0-9]\) /\1/g')
, , . , utf-8 regex sed.
4) bash -3.2 MacOS 10.6.8 ( ).
5) - regEx-onliners , , .
, !