Removing Unicode String Separator in Bash

I have a text file with a unicode line separator (hex 2028).

I want to remove it using bash (I see implementations for Python , but not for this language). Which command could I use to convert a text file (output4.txt) to lose the line separator in Unicode?

See in vim below: enter image description here

+4
source share
3 answers

This tr command should probably work as well:

tr '\xE2\x80\xA8' ' ' < inFile > outFIle 

Working solution: Thanks to OP for searching:

 sed -i.old $'s/\xE2\x80\xA8/ /g' inFile 
+4
source

I noticed that in your screenshot you already opened the file in vim, why not just make a replacement in vim?

in vim you could do

 :%s/(seebelow)//g 

part (seebelow) , you can enter:

ctrl-v u 2 0 2 8

+3
source

Perhaps you can use sed:

 sed 's/\x20\x28//g' <file_in.txt >file_out.txt 

To overwrite the original file:

 sed -i 's/\x20\x28//g' file.txt 

Edit : (see chepner comment). You have to make sure that you have the correct bytes, depending on the encoding, and then use sed to remove them. You can use for example. od -t x1 to view the hex dump and determine the encoding.

+1
source

Source: https://habr.com/ru/post/1480798/


All Articles