How to replace all characters "0xa0" with "" in a bunch of text files?

I tried bulk editing a bunch of text files in utf-8 in python, and this error keeps popping up. is there any way to replace them in some python scripts or bash commands? I used the code:

writer = codecs.open(os.path.join(wrd, 'dict.en'), 'wtr', 'utf-8') for infile in glob.glob(os.path.join(wrd,'*.txt')): print infile for line in open(infile): writer.write(line.encode('utf-8')) 

and got these errors:

 Traceback (most recent call last): File "dicting.py", line 30, in <module> writer.write(line2.encode('utf-8')) UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 216: unexpected code byte 
+4
source share
3 answers

OK, first point: your output file is configured to automatically encode text written on it as utf-8 , so do not include an explicit call to the encode('utf-8') method when passing arguments to the write() method.

So, the first thing to try is to simply use the following in your inner loop:

 writer.write(line) 

If this does not work, then the problem will almost certainly be that, as others have noted, you are not properly decoding your input file.

Accepting wild assumptions and assuming that your input files are encoded in cp1252 , you can try a quick test in the inner loop:

 for line in codecs.open(infile, 'r', 'cp1252'): writer.write(line) 

Minor point: "wtr" is a meaningless mode line (since write access means read access). Simplify it with either wt or even w.

+11
source

Did you miss some code? You are reading in line , but trying to transcode line2 .

In any case, you will need to tell Python that the encoding of the input file; if you do not know this, then you will have to open its raw and perform replacements without the help of a codec.

+3
source

Please be serious - a simple replace () operation will complete the task:

 line = line.replace(chr(0xa0), '') 

In addition, the codecs.open () constructors support the "errors" parameter to handle conversion errors. Please read (on your own).

-3
source

Source: https://habr.com/ru/post/1345191/


All Articles