Reading the file with the correct encoding

I have a txt file where, if I open a standard text editor like notepad or scite, I can read lines like this:

Artist1 – Title 1 Artist2 – Title 2 

Than I open it with a PHP script, and I read the lines:

 $tracklistFile_name=time().rand(1, 1000).".".pathinfo($_FILES['tracklistFile']['name'], PATHINFO_EXTENSION); if(((pathinfo($tracklistFile_name, PATHINFO_EXTENSION)=='txt')) && (move_uploaded_file($_FILES['tracklistFile']['tmp_name'], 'import/'.$tracklistFile_name))) { $fileArray=file('import/'.$tracklistFile_name, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES); $fileArray=array_values(array_filter($fileArray, "trim")); for($i=0; $i<sizeof($fileArray); $i++) { echo $fileArray[$i]."<br />"; } } 

and ... WOW ... I get this result:

 Artist1   Title1 Artist2   Title2 

??? What is this symbol? I think the encoding is failing. The symbol is so mistaken that I cannot insert them into the database, nor using mysql_real_escape_string() . In fact, I get this error when I try to insert them:

 Incorrect string value: '\x96 Titl...' for column 'atl' at row 1 

How can I solve this problem? Suggestions?

EDIT

Tried to add utf8_encode () before pasting / adding these lines: now pasting is not a failure, but the result is:

 Artist1 Title1 Artist2 Title2 

So, I lost the information. Why?

+6
source share
2 answers

You should read Joel Spolsky's article on UTF-8 and encoding .

Your problem is almost certainly related to the encoding mismatch, your first task is to find out where this mismatch occurs, your problem may be in a bunch of different places.

1) your php code could read the input using the wrong encoding (if you are trying to read in iso-8859, but the source file is encoded in some other way)

2) your PHP code may write output using the wrong encoding

3) everything that you use to read the output (your browser) can be set to a different encoding than the bytes you write.

Once you find out which of the three places is causing your problem, you can figure out how to fix this by understanding what the source encoding is, and how to read / write using this source encoding instead of another encoding (the system is probably installed as default value).

EDIT: without knowing php well, it looks like you could use mb_detect_encoding and possibly also mb-convert-encoding .

+9
source

Try the following: $str = str_replace('\\x', '&#', $str);

-2
source

Source: https://habr.com/ru/post/886737/


All Articles