I use BeautifulSoup to clear data from a web page. I want to compare website data with text that is in a .txt document. However, it looks like I'm having encoding problems.
The website has the text "heated oven to 400 Β°". The text also looks like this: "view source" (without html objects).
The website is read using beautifulSoup:
source = "my url".read() .... soup = BeautifulSoup(source)
The text document was created by creating a new text document encoded as "Encoding in UTF-8 without specification." Then I copied the βheated oven to 400 Β°β from the website into a text document and saved.
The text file reads as
f = codecs.open('myfilename', encoding='utf-8')
When I compare two strings, they are not equal, but I want them to be.
To find out what is going on: in Eclipse, I split the two texts and, looking at the variables in debug mode, I see that the degree sign from BeautifulSoup is displayed as \ xc2 \ xb0. The degree sign from a text document simply displays as \ xb0.
Why and how to fix it? I have this problem with many special characters, so I need a general solution. In addition, I will copy data from several sites into a text document.
source share