Check glyph-based string language in PHP

Question

Check glyph-based string language in PHP

I have a MySQL database with book titles in English and Arabic, and I use a PHP class that can automatically transliterate Arabic text into a Latin script.

I want my HTML output to look something like this:

<h3>A book</h3> <h3>كتاب <em>(kitaab)</em></h3> <h3>Another book</h3>

Is there a way for PHP to determine the language of a string based on the Unicode characters and glyphs used in it? I am trying to get something like this:

 $Ar = new Arabic('EnTransliteration'); while ($item = mysql_fetch_array($results)) { ... if (some test to see if $item['item_title'] has Arabic glyphs in it) { echo "<h3>$item[item_title] <em>(" . $Ar->ar2en($item['item_title']) . ")</em></h3>"; } else { echo "<h3>$item[item_title]</h3>"; } ... }

Fortunately, the class does not suffocate when serving Latin characters, so theoretically I can send each result through a conversion, but this seems like a waste of processing.

Thanks!

Edit: I still haven't found a way to test glyphs or characters. I suppose I could put all the Arabic characters in an array and check if anything in the array matches part of the string ...

However, I figured out a workaround that could work just fine in the end. It translates each title through a conversion, regardless of the language, but outputs only the lowercase transliteration if the line has been changed:

 while ($item = mysql_fetch_array($mysql_results)) { $transliterate = trim(strtolower($Ar->ar2en($item['item_title']))); $item_title = (strtolower($item['item_title']) == $transliterate) ? $item['item_title'] : $item['item_title'] . " <em>($transliterate)</em>"; echo "<h3>$item_title</h3>"; }

+4

php mysql unicode arabic

Andrew Jun 18 '09 at 9:50

source share

2 answers

Here's an open source PHP class for automatically detecting an Arabic character:

http://www.ar-php.com/php/arabic/index.html#ArCharsetD

0

karim79 Jun 18 '09 at 9:57

source share

mercator · Accepted Answer · 2009-06-20T15:02:02+0000

This should do it:

 preg_match("/\p{Arabic}/u", $item['item_title'])

You can make this regex a little more complicated if you want, but I don't think you really need to.

\p escape sequence allows you to select characters based on their Unicode properties (when u template modifier ).

The PHP manual mentions: "Advanced properties such as Greek or InMusicalSymbols are not supported by PCRE." But it is not so. PCRE release 6.5 added script name support .

Check glyph-based string language in PHP

More articles: