Black diamond question marks versus loss of certain characters.

Question

Black diamond question marks versus loss of certain characters.

I read this problem and ran into it before, but I still have to find a solution for both sides. I have a news aggregator that retrieves news from RSS feeds. After displaying the content, I got black question marks, so after some research, I added the following line of code (PHP):

$content = mb_convert_encoding($content, 'UTF-8', 'HTML-ENTITIES');

It was possible to successfully solve this problem, but looking at another article in which there were some words with accents, I noticed that the accents were converted to funky. They used to look great. For example, now I see things like:

Gene © ticas

Now I would prefer the second release, as it does not stick out so much, but ideally I want to fix both. My MySQL tables are UTF-8 as well as the page document type. Any ideas?

+4

html php encoding unicode

user387049 Feb 15 '12 at 20:48

source share

1 answer

Gordonm · Accepted Answer · 2012-02-15T21:18:35+0000

You are extracting data from multiple channels, so you cannot depend on all the channels that you use with the same character encoding.

The XML feeds are supposed to report which encoding they use in the XML preamble, and the server should send headers containing the character encoding that the file uses, but they are not necessary, and if they are not specific, to be exact.

You can use mb_detect_encoding to try to figure out which encoding you use for parsing, but again, which is not 100% accurate.

So, if your goal normalizes all the channels that you are processing into the same encoding (I assume UTF 8), your options should look at the XML preamble, headers (if the corresponding one is sent) and the result of mb_detect_encoding to determine the encoding. If all of the above is consistent with you, probably (but not necessarily), there is a good idea what the encoding of the file is and what you need to do to convert it. If there is any disagreement between any of these methods, then you will have to decide for yourself what actions to take.

In short, welcome to Harset Hell. How do you like it?

Black diamond question marks versus loss of certain characters.

More articles: