Reading ansi file and converting to UTF-8 string

Question

Reading ansi file and converting to UTF-8 string

Is there any way to do this with PHP?

The inserted data looks great when I print it.

But when I insert it into the database, the field becomes empty.

+4

file php ansi

user192344 Jan 4 '11 at 15:46

source share

3 answers

"ANSI" is not really an encoding. This is a short way to say: "Any encoding is the default value on the computer that creates the data." So you have a double task:

Find out what encoding data is used.
Use the appropriate function to convert to UTF-8.

For # 2, I usually like iconv() , but utf8_encode() can also do the job, since the source data is used to use ISO-8859-1.

Update

It looks like you have no idea what encoding your data uses. In some cases, you can understand if you know the country and language of the user (for example, Spain / Spanish) using the default encoding used by Microsoft Windows in this territory.

+6

Álvaro González Jan 4 '11 at 15:52

source share

Be careful, using iconv() may return false if the conversion fails.

I also have a somewhat similar problem, some characters from the Chinese alphabet are mistaken for \n if the file is encoded in UNICODE, but not if it is UFT-8.

To get back to your problem, make sure the encoding of your file matches the encoding of your database. Using utf-8_encode() in a text already utf-8 can have unpleasant results. Try using mb_detect_encoding() to view the encoding of the file, but unfortunately this method does not always work. There is no simple fix for character encoding from what I see :(

+1

Victor priceputu Dec 04 '13 at 12:03

source share

Mark beckers · Accepted Answer · 2011-01-04T15:50:07+0000

$tmp = iconv('YOUR CURRENT CHARSET', 'UTF-8', $string);

or

 $tmp = utf8_encode($string);

It is strange that in the end you get an empty row in your database. I can understand that in the end you will have some garbarge, but nothing (empty line) is weird.

I just typed this in the console:

 iconv -l | grep -i ansi

He showed me:

 ANSI_X3.4-1968 ANSI_X3.4-1986 ANSI_X3.4 ANSI_X3.110-1983 ANSI_X3.110 MS-ANSI

These are the possible values for YOUR CURRENT SHARSET. As stated earlier, when your input line contains characters allowed in UTF, you do not need to convert anything.

Change UTF-8 to UTF-8 // TRANSLIT when you do not want to skip characters, but replace them with a similar one (when they are not in the UTF-8 set)

Reading ansi file and converting to UTF-8 string

Update

More articles: