Removing Unicode String Separator in Bash

Question

Removing Unicode String Separator in Bash

I have a text file with a unicode line separator (hex 2028).

I want to remove it using bash (I see implementations for Python , but not for this language). Which command could I use to convert a text file (output4.txt) to lose the line separator in Unicode?

See in vim below: enter image description here

+4

bash

programming_historian May 14, '13 at 20:31

source share

3 answers

I noticed that in your screenshot you already opened the file in vim, why not just make a replacement in vim?

in vim you could do

 :%s/(seebelow)//g

part (seebelow) , you can enter:

ctrl-v u 2 0 2 8

+3

Kent May 14, '13 at 20:49

source share

Perhaps you can use sed:

 sed 's/\x20\x28//g' <file_in.txt >file_out.txt

To overwrite the original file:

 sed -i 's/\x20\x28//g' file.txt

Edit : (see chepner comment). You have to make sure that you have the correct bytes, depending on the encoding, and then use sed to remove them. You can use for example. od -t x1 to view the hex dump and determine the encoding.

+1

Sir athos May 14, '13 at 20:35

source share

anubhava · Accepted Answer · 2013-05-14T20:53:10+0000

This tr command should probably work as well:

tr '\xE2\x80\xA8' ' ' < inFile > outFIle

Working solution: Thanks to OP for searching:

 sed -i.old $'s/\xE2\x80\xA8/ /g' inFile

Removing Unicode String Separator in Bash

More articles: