Perl UTF8 CGI and DBI ... what is the right workflow?

Question

Perl UTF8 CGI and DBI ... what is the right workflow?

I am happy to rebuild the perl-based web framework to support UTF8. I took the following steps

for the main script:
use open IO => ":utf8",":std";
use utf8;

for DBI adapter:
$self->{dbh}->{'mysql_enable_utf8'} = 1;'

and in my request syntax for COST-based POST and GET:
foreach (@val) { $_ = decode("UTF-8",$_); }

This, as far as I can tell, works fine on my local Ubuntu with Perl 5.10.1, but on a web server that runs 5.10, decoding POST or GET will ruin the text.

I have to admit, all of UTF8 really confuses me. I need to read templates
Get data from mySQL
POST and GET process to insert into mySQL
write templates

Is there something that I forget here? What can cause erratic behavior? Does every module that I use mainly a script need use utf8 specifically, or is it enough if the main script does this?

Thanks for any tips,
Thomas

+4

mysql perl utf-8

thomas Jan 13 '11 at 13:49

source share

6 answers

Penfold · Answer 1 · 2011-01-13T15:59:57+0000

use utf8; as several people have said, it has nothing to do with your input / output problems: all it says is "treats my source code as utf8 encoded."

The MySQL / DBI approach takes off for money.

For CGI, update the latest CGI and set $CGI::PARAM_UTF8=1 and it will do decode() for you. (As a general tip, BTW, decode_utf8() much faster!)

As for the other issue, you can compare your Apache server configurations to see if AddDefaultCharset is set to some not useful value.

Also, see my talk last year at the London Perl Workshop for a more detailed look at Perl and Unicode.

Andrew · Answer 2 · 2012-05-24T22:14:45+0000

The solution here is streamlining.

 $dbh->{mysql_enable_utf8} = 1; $dbh->connect ... $dbh->do('SET NAMES \'utf8\';') || die;

Enjoy :)

Rob boerman · Answer 3 · 2011-01-14T08:45:37+0000

Thomas,

At the risk of additional negative points, I do not know if this is still necessary, but in the past I had to make sure that my DBI behaves correctly with utf8:

my $ dbh = DBI-> connect (...); $ dbh → {mysql_enable_utf8} = 1; $ dbh-> do ("set names 'utf8';");

Maybe it can help

Rob boerman · Answer 4 · 2011-01-13T14:15:55+0000

First of all, my condolences regarding your Latin-> utf8-work. I did this for great use several years ago, and the wrinkles that he still hasn’t erased to me.

What I recommend you do is turn everything into UTF8, rather than trying to do decoding and stuff. It will definitely mess up somewhere. storing utf8 data in a latin table is a recipe for disaster. At some point, I remember that my database had utf8 double and triplex encoded strings, and I could not tell how to return the original string.

Actions you must take:

Create a secondary database structure using the UTF8 table instead of the table
extract everything from your main database and paste into a new database (hoping you have not saved a single utf8 row yet)
make sure that the Mime headers sent by your application indicate that the encoding is in utf8, all the data that you return from these pages will automatically accept the encoding of the page itself.
cross your fingers and take a vacation ...

You do not need to change much in your application, since the DBI utf8 processing at this time is pretty good.

Good luck

Rob

Alien life form · Answer 5 · 2011-01-13T14:24:36+0000

Check it out . It is quite general, but it will receive your vocabulary directly, and although many examples are in python, it is. BTW, if you try to fill Latin-1 (or other) encoded material without decoding / transcoding, a disaster will occur.

See the article for more information.

Greetings

Andrew C · Answer 6 · 2012-07-17T10:58:16+0000

Here you will find a complete (and verified) guide here . He does not miss anything; Perl, DBI and MySQL. All utf8'd.
I had the same pain, but in the end everything was done.

Perl UTF8 CGI and DBI ... what is the right workflow?

More articles: