How to print characters, for example ●, into Python files

I am trying to write a ● character in a text file in python. I think this has something to do with encoding (utf-8). Here is the code:

 # -*- coding: utf-8 -*- outFile = open('./myFile.txt', 'wb') outFile.write("●") outFile.close() 

Instead of black "●" I get "Γ’β€”" . How can i fix this?

+6
source share
6 answers

Open the file with the io package so that it works with both python2 and python3 with the encoding installed on utf8 to make it work. When printing while recording, write as a string in Unicode.

 import io outFile = io.open('./myFile.txt', 'w', encoding='utf8') outFile.write(u'●') outFile.close() 

Tested on Python 2.7.8 and Python 3.4.2

+3
source

If you are using Python 2, use codecs.open instead of open and unicode instead of str :

 # -*- coding: utf-8 -*- import codecs outFile = codecs.open('./myFile.txt', 'wb', 'utf-8') outFile.write(u"●") outFile.close() 

In Python 3, pass the encoding keyword argument to open :

 # -*- coding: utf-8 -*- outFile = open('./myFile.txt', 'w', encoding='utf-8') outFile.write("●") outFile.close() 
+1
source
 >>> ec = u'\u25cf' # unicode("●", "UTF-8") >>> open("/tmp/file.txt", "w").write(ec.encode('UTF-8')) 
0
source

What your program does is create the output file in the same encoding as your program editor ( coding at the top does not matter unless your program editor uses it to save the file). Thus, if you open myFile.txt with a program that uses the same encoding as your program editor, everything looks fine.

This does not mean that your program works for everyone.

You must do two things for this. First you must specify the encoding used for text files on your computer. This is a bit hard to spot, but often need to work:

 # coding=utf-8 # Put your editor encoding here import codecs import locale import sys # Selection of the first non-None, reasonable encoding: out_encoding = (locale.getlocale()[1] or locale.getpreferredencoding() or sys.stdin.encoding or sys.stdout.encoding # Default: or "UTF8") outFile = codecs.open('./myFile.txt', 'w', out_encoding) 

Please note that it is very important to specify the correct coding on top of the file: this should be your encoding of the program editor.

If you know the encoding you want for your output file, you can directly put it in open() . Otherwise, the more general and portable out_encoding expression above should work for most users on most computers (i.e., regardless of their choice, they should be able to read "●" in the resulting file - if it can represent their computer encoding).

Then you should print a string, not bytes:

 outFile.write(u"●") 

(note the leading u , which means "Unicode string").

For a deeper understanding of the problems, one of my previous answers should be very helpful: UnicodeDecodeError when redirecting to a file .

0
source

That should do the trick

 # -*- coding: utf-8 -*- outFile = open('./myFile.txt', 'wb') outFile.write(u"\u25CF".encode('utf-8')) outFile.close() 

look this

0
source

I am very sorry, but writing a character to a text file, not to mention that the encoding of the file should just be insensitive.

This may not be obvious at first glance, but text files are actually encoded and can be encoded in different ways. If you have only letters (upper and lower case, but not accented), numbers and simple characters (everything that has an ASCII code below 128), everything should be fine, because ASCII 7 bits are now standard, and in fact these characters have the same representation in basic encodings.

But as soon as you get true characters or accented characters, their presentation varies from one encoding to another. For example, the ● character has a UTF-8 representation (Python encoding): \xe2\x97\x8f . Worse still, it cannot be represented in latin1 (ISO-8859-1) encoding.

Another example is the French accent aigu: Γ© it is represented in UTF8 as \xc3\xa9 (note 2 bytes), but represented in Latin1 as \x89 (one byte)

So, I checked your code in my Ubuntu field using UTF8 encoding and the cat myFile.txt ... correctly showed the bullet!

 sba@sba-ubuntu :~/stackoverflow$ cat myFile.txt ● sba@sba-ubuntu :~/stackoverflow$ 

(since you did not add a new line after the bullet, it immediately follows it)

Finally:

Your code correctly writes the marker to the file in UTF8 encoding. If your system uses a different encoding from the very beginning (ISO-8859-1 or its version of Windows-1252), you cannot convert it initially, because this character simply does not exist in these encodings.

But you can always see it in a text editor that supports various encodings, such as the excellent vim that exists on all major systems.


The proof above:

On a computer running Windows 7, I opened a vim window and instructed it to accept utf8 with :set encoding='utf8' . Then I pasted the source code from OP and saved it in the file foo.py

I opened the cmd.exe window and executed python foo.py (using Python 2.7): it created a myFile.txt file containing 3 bytes (hexa): e2 97 8f , which is a utf8 bullet view ● (I could confirm this with vim Tools / Hexa convert).

I could even open myFile.txt in standby mode and actually saw the bullet. Even notepad.exe can show a bullet!

Thus, even on a computer running Windows 7, which initially does not accept utf-8, the code from OP correctly generates a text file that, when opened with a text editor that accepts UTF-8, contains a ● marker.

Of course, if I try to open myFile.txt with vim in latin1 mode, I get: Γ’β€” , in cmd windows with code page 850, type myFile.txt shows ÔùÅ , and with code page 1252 (latin1 variant): Γ’ -.

In conclusion, the original OP code creates the correct utf8 encoded file - this is part of the reading to correctly interpret utf8.

0
source

Source: https://habr.com/ru/post/985457/


All Articles