Export Excel csv to php file using fgetcsv

I am using excel 2010 professional plus to create an excel file. Later I try to export it to a UTF-8.csv file. I do this by saving it as a CSV (the character is separated ..... sry I do not know the exact wording there, but I do not have the English version, and I am afraid that it is not translated in the same way as 1: 1). There I click on tools-> weboptions and select unicode (UTF-8) as the encoding. An example .csv is as follows:

ID;englishName;germanName 1;Austria;Ă–sterreich 

So far so good, but if I open the file now using my php code:

  header('Content-Type: text/html; charset=UTF-8'); iconv_set_encoding("internal_encoding", "UTF-8"); iconv_set_encoding("output_encoding", "UTF-8"); setlocale(LC_ALL, 'de_DE.utf8'); $fp=fopen($filePathName,'r'); while (($dataRow= fgetcsv($fp,0,";",'"') )!==FALSE) { print_r($dataRow); } 
  • I get: sterreich as a result on the screen (since this is an “error”, I cut all the rest of the result).
  • If I open the file with the marked ++ block and look at the encoding, I see "ANSI" instead of UTF-8.
  • If I change the encoding in Notepad ++ to UTF8 ..., then ö, ä, ... are replaced with special characters, which I have to fix manually.

If I switch to a different route and create a new UTF-8 file with the marked ++ block and put the same data as in the excel file, "Ă–sterreich" appears on the screen when I open it with a php file.

Now I have a question why it does not work with excel, so am I doing something wrong here? Or am I not noticing something?

Edit: Since the program will ultimately be installed on the Windows servers provided by the clients, a solution is needed in which there is no need to install additional tools (php libraries, ... okay, but to install vm-ware or cygwin. .. no). In addition, the server will not have an excel server (or office) locally installed on the server, since the client will be able to download the CSV file through the file download dialog (the dialog itself is not part of the problem, since I know how to handle them, and, in addition, the problem itself that I came across when I created an excel file and converted it to .csv on testmachine where excel was locally installed).

Tpx

+4
source share
5 answers

From PHP DOC

This function is taken into account in the locale settings. If LANG is, for example, en_US.UTF-8, single-byte encoded files are not read correctly by this function .

You can try

 header('Content-Type: text/html; charset=UTF-8'); $fp = fopen("log.txt", "r"); echo "<pre>"; while ( ($dataRow = fgetcsv($fp, 1000, ";")) !== FALSE ) { $dataRow = array_map("utf8_encode", $dataRow); print_r($dataRow); } 

Output

 Array ( [0] => ID [1] => englishName [2] => germanName ) Array ( [0] => 1 [1] => Austria [2] => Ă–sterreich ) 
+11
source

I don’t know why Excel generates ANSI file instead of UTF-8 (as you can see in Notepad ++), but if so, you can convert the file using iconv:

iconv --from-code = ISO-8859-1 --to-code = UTF-8 my_csv_file.csv> my_csv_file_utf8.csv

+1
source

And for people from the Czech Republic:

 function convert( $str ) { return iconv( "CP1250", "UTF-8", $str ); } ... while (($data = fgetcsv($this->fhandle, 1000, ";")) !== FALSE) { $data = array_map( "convert", $data ); ... 
+1
source

From what you are saying, I suspect that Excel writes a UTF-8 file without specification , which makes the assumption that the encoding is utf-8 is slightly trickier. You can confirm this diagnosis if the characters are displayed correctly in Notepad ++ when you click Format->Encode in UTF-8 (without BOM) (rather than Format->Convert to UTF-8 (without BOM) ).

And are you sure that every user will use UTF-8? It sounds to me that you need something that understands a little what real input encoding is. By smart, I mean that guessing recognizes BOM-less UTF-8.

To abort the chase, I would do something like this:

 $f = fopen('file.csv', 'r'); while( ($row = fgets($f)) != null ) if( mb_detect_encoding($row, 'UTF-8', true) !== false ) var_dump(str_getcsv( $row, ';' )); else var_dump(str_getcsv( utf8_encode($row), ';' )); fclose($f); 

Which works because you read characters to guess the encoding, and not lazily trusting the first three characters: therefore UTF-8 without a specification will still be recognized as UTF-8. Of course, if your csv file is not too large, you can do this encoding detection in the entire contents of the file: something like mb_detect_encoding(file_get_contents(...), ...)

0
source

The problem should be your file encoding, it does not look utf-8.

When I tried my example and double checked the file, which really is utf-8, it works for me, I get:

Array ([0] => 1 [1] => Austria [2] => Ă–sterreich)

Use LibreOffice (OpenOffice), it is more reliable for such things.

0
source

Source: https://habr.com/ru/post/1444535/


All Articles