How to remove this special character?

I tried to combine lines in my file when I noticed the following:

word1 word2
word1 word2

I did not understand why these lines were not combined, so I opened the file in vim and used :set list to see if there were any special characters, and I found this:

  word1 <feff>word2 word1 word2 

I am not sure how to clear this word in Python. Any suggestions on what character might be and how can this be cleaned up?

+6
source share
2 answers

U + FEFF is a Byte Order Mark symbol that should only appear at the beginning of a document. In documents, it should be considered as ZERO WIDTH NON-BREAKING SPACE . If this causes problems, you can remove it like any other character:

 >>> s = u'word1 \ufeffword2' >>> s = s.replace(u'\ufeff', '') >>> s u'word1 word2' 

(In Python 3.1 or 3.2, drop u before the lines)

+15
source

Have you tried mytext.split(string.whitespace) ?

+1
source

Source: https://habr.com/ru/post/893334/


All Articles