This is what I got from wikipedia with the character 0xff , which is the character for UTF-16.
UTF-16[edit] In UTF-16, a BOM (U+FEFF) may be placed as the first character of a file or character stream to indicate the endianness (byte order) of all the 16-bit code units of the file or stream. If the 16-bit units are represented in big-endian byte order, this BOM character will appear in the sequence of bytes as 0xFE followed by 0xFF. This sequence appears as the ISO-8859-1 characters รพรฟ in a text display that expects the text to be ISO-8859-1. if the 16-bit units use little-endian order, the sequence of bytes will have 0xFF followed by 0xFE. This sequence appears as the ISO-8859-1 characters รฟรพ in a text display that expects the text to be ISO-8859-1. Programs expecting UTF-8 may show these or error indicators, depending on how they handle UTF-8 encoding errors. In all cases they will probably display the rest of the file as garbage (a UTF-16 text containing ASCII only will be fairly readable).
So, I have two thoughts:
(1) This may be due to the fact that it should be considered as utf-16 instead of utf-8
(2) An error occurs because you are trying to print all the soup on the screen. Then this will cause your IDE (Eclipse / Pycharm) to be smart enough to display these unicode.
If I were you, I would try to move on without printing all the soup and collect only the part that you want. We will see that you have a problem reaching this step. If there are no problems, then why worry that you cannot print the entire soup on the screen.
If you really want to print the soup on screen, try:
print soup.prettify(encoding='utf-16')
source share