I download HTML from an external server. HTML markup is UTF-8 encoded and contains characters such as ľ, š, č, ť, ž, etc. When I load the HTML with the_get_contents () file as follows:
$html = file_get_contents('http://example.com/foreign.html');
He messed up the UTF-8 characters and loads Å, ¾, ¤ and similar nonsense instead of the correct UTF-8 characters.
How can i solve this?
UPDATE:
I tried to save the HTML file to a file and output it with UTF-8 encoding. Both do not work, so it means that file_get_contents () is already returning broken HTML.
UPDATE2:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="sk" lang="sk"> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <meta http-equiv="Content-Style-Type" content="text/css" /> <meta http-equiv="Content-Language" content="sk" /> <title>Test</title> </head> <body> <?php $html = file_get_contents('http://example.com'); echo htmlentities($html); ?> </body> </html>
php utf-8 file-get-contents
Richard Knop Feb 10 2018-10-10 12:19
source share