Work with eacute and other special characters using Oracle, PHP and Oci8

Hi, I am trying to save names in an Oracle database and return them using PHP and oci8.

However, if I insert é directly into the Oracle database and use oci8 to return it, I just get e

Do I have to encode all special characters (including é ) into html objects (i.e.: é ) before pasting into the database ... or am I missing something?

thanks


UPDATE: March 1 at 18:40

found this function: http://www.php.net/manual/en/function.utf8-decode.php#85034

 function charset_decode_utf_8($string) { if(@!ereg("[\200-\237]",$string) && @!ereg("[\241-\377]",$string)) { return $string; } $string = preg_replace("/([\340-\357])([\200-\277])([\200-\277])/e","'&#'.((ord('\\1')-224)*4096 + (ord('\\2')-128)*64 + (ord('\\3')-128)).';'",$string); $string = preg_replace("/([\300-\337])([\200-\277])/e","'&#'.((ord('\\1')-192)*64+(ord('\\2')-128)).';'",$string); return $string; } 

seems to work, although not sure if its optimal solution


UPDATE: March 8 at 3:45 pm

Oracle Character Set - ISO-8859-1.
in PHP I added:

 putenv("NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1"); 

to force the oci8 connection to use this character set. Getting é using oci8 from PHP now worked! (for varchars , but not CLOBs had to do utf8_encode to extract it)
So, I tried to save data from PHP to Oracle ... and it does not work. Elsewhere on the way from PHP to Oracle é becomes ?


UPDATE: March 9 at 2:47 p.m.

So take a closer look. After adding the NLS_LANG variable, doing oci8 direct attachments with é works.

The problem is actually on the PHP side. Using the ExtJs framework, when submitting a form, it encodes it using encodeURIComponent .
Therefore, é sent as %C3%A9 , and then transcoded to é .
However, the length is now 2 (strlen($my_sent_value) = 2) , not 1. And if in PHP I try: $ my_sent_value == é = FALSE

I think that if I can recode all these characters in PHP back to a byte length of size 1 and then paste them into Oracle, it should work.

Still no luck though


UPDATE: March 10 at 11:05

I keep thinking that I'm so close (still so far).

putenv("NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P9"); works very sporadically.

I created a small php script for testing:

 header('Content-Type: text/plain; charset=ISO-8859-1'); putenv("NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P9"); $conn= oci_connect("user", "pass", "DB"); $stmt = oci_parse($conn, "UPDATE temp_tb SET string_field = '|é|'"); oci_execute($stmt, OCI_COMMIT_ON_SUCCESS); 

After running this one time and entering the Oracle database directly, I see that STRING_FIELD is set to |¿| . Obviously not what I expected from my previous experience.
However, if I refresh this PHP page twice ... it worked !!!
In Oracle, I saw |é| .

It seems that perhaps the environment variable was set incorrectly or sent during the first execution of the script, but is available for the second execution.

My next experiment is to export the variable to a PHP environment, however I need to reset Apache for this ... so we'll see what happens, hope it works.

+4
source share
4 answers

Here is what I finally did to solve this problem:

The profile of the daemon running PHP is changed:

 NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1 

Thus, oci8 connection uses ISO-8859-1.

Then, in my PHP configuration, set the default content type to ISO-8859-1:

 default_charset = "iso-8859-1" 

When I insert into an Oracle table through oci8 from PHP, I do:

 utf8_decode($my_sent_value) 

And when retrieving data from Oracle, printing this variable should only work like this:

 echo $my_received_value 

However, when sending this data via ajax, I had to use:

 utf8_encode($my_received_value) 
+1
source

I assume you know about these facts:

  • There are many different character sets: you need to choose one and, of course, find out which one you are using.
  • Oracle is very good at storing text without HTML objects ( é ). HTML objects are used, well, HTML. Oracle is not a web browser; -)

You should also be aware that HTML objects are not bound to a specific encoding; on the contrary, they are used to represent characters in a context independent of the character set.

You are not clearly talking about ISO-8859-1 and UTF-8. What encoding do you want to use? ISO-8859-1 is easy to use, but it can only store text in some Latin languages ​​(for example, in Spanish), and it lacks common characters, such as a character. UTF-8 is harder to use, but it can store all the characters defined by the Unicode consortium (including everything you will ever need).

Once you have made your decision, you must configure Oracle to store the data in this encoding and select the appropriate column type. For example, VARCHAR2 is great for simple ASCII, NVARCHAR2 is great for UTF-8.

+2
source

If you really can't change the character set that oracle will use, then what about Base64 encoding your data before storing it in the database. That way, you can accept characters from any character set and store them as ISO-8859-1 (because Base64 will output a subset of the ASCII character set that exactly matches ISO-8859-1). Base64 encoding will increase the string length by an average of 37%

If your data will only be displayed as HTML, you can also store HTML objects as you suggested, but keep in mind that one object can contain up to 10 characters per unspecified character, for example. & Thetasym; ϑ

0
source

I had to deal with this problem: LatinAmerican special characters are stored as "?" or "¿" in my Oracle database ... I cannot change NLS_CHARACTER_SET because we are not the owners of the databases.

So, I found a workaround:

1) ASP.NET code Create a function that converts a string to hexadecimal characters:

  public string ConvertirStringAHex(String input) { Encoding encoding = System.Text.Encoding.GetEncoding("ISO-8859-1"); Byte[] stringBytes = encoding.GetBytes(input); StringBuilder sbBytes = new StringBuilder(stringBytes.Length); foreach (byte b in stringBytes) { sbBytes.AppendFormat("{0:X2}", b); } return sbBytes.ToString(); } 

2) Apply the function above to the variable you want to encode, for example

  myVariableHex = ConvertirStringZHex( myVariable ); 

In ORACLE, use the following:

  PROCEDURE STORE_IN_TABLE( iTEXTO IN VARCHAR2 ) IS BEGIN INSERT INTO myTable( SPECIAL_TEXT ) VALUES ( UTL_RAW.CAST_TO_VARCHAR2(HEXTORAW( iTEXTO )); COMMIT; END; 

Of course, iTEXTO is an Oracle parameter that gets the value "myVariableHex" from ASP.NET code.

Hope this helps ... if there is anything to improve the PLS feel free to post your comments.

Sources: http://www.nullskull.com/faq/834/convert-string-to-hex-and-hex-to-string-in-net.aspx https://forums.oracle.com/thread/44799

0
source

Source: https://habr.com/ru/post/1302770/


All Articles