Oracle Text will not work with NVARCHAR2. What else might not be available?

Question

Oracle Text will not work with NVARCHAR2. What else might not be available?

We are going to port the application so that it supports Unicode and has to choose between the Unicode character set for the entire database or the Unicode columns stored in N [VAR] CHAR2.

We know that we will no longer be able to index the contents of a column using Oracle Text if we choose NVARCHAR2, because Oracle Text can only index columns using the CHAR type.

Also, is it possible that other major differences arise when harvesting from Oracle's capabilities?

Also, is it possible that some new features have been added to newer versions of Oracle, but only CHAR columns or NCHAR columns are supported, but not both?

Thank you for your responses.

Note after Justin's answer:

Thanks for your reply. I will discuss your questions applicable to our case:

Our application, as a rule, is the only one in the Oracle database and takes care of the data itself. Other software that connects to the database is limited to Toad, Tora, or SQL.

We also use SQL * Loader and SQL * Plus to communicate with the database for basic or update product versions. We have not heard of any specific problem with all this software regarding NVARCHAR2.

We also don’t know that the database administrators among our clients how to use other tools in the database that cannot support NVARCHAR2 data, and we don’t really care if their tools can destroy, because they are skilled in their work and can, if necessary find other tools.

Your last two points are more insightful for our case. We do not use many built-in packages from Oracle, but this is still happening. We will consider this problem.

Can we also expect a performance gap if our application (which is compiled under Visual C ++), which uses wchar_t to store UTF-16, must perform encoding conversions for all processed data?

+20

oracle unicode nvarchar character-encoding

Benoit Dec 09 '10 at 17:07

source share

1 answer

Justin Cave · Accepted Answer · 2010-12-09 18:34

If you have something close to selection, use the Unicode character set for the entire database. Life as a whole is simply dazzlingly easier.

There are many third-party utilities and libraries that simply do not support NCHAR / NVARCHAR2 columns or do not make working with NCHAR / NVARCHAR2 columns pleasant. This is very annoying, for example, when your brilliant new reporting tool cannot report your NVARCHAR2 data.
For custom applications, working with NCHAR / NVARCHAR2 columns requires going through some hoops that work with CHAR / VARCHAR2 Unicode encoded columns. For example, in JDBC code, you constantly call the Statement.setFormOfUse method. Other languages and frameworks will have other errors; some will be relatively well documented, while others will be relatively obscure.
Many built-in packages will accept (or return) VARCHAR2, not NVARCHAR2. You may still be able to call them due to implicit conversion, but you may run into character set conversion problems.
In general, the ability to avoid problems with character set conversion in the database and push these problems to the edge, where the database actually sends or receives data from the client, facilitates the development of the application. This is enough to debug the character set conversion problems that result from network transmission — finding out that some data was corrupted when the stored procedure combined the data from VARCHAR2 and NVARCHAR2 and saved the result to VARCHAR2 before it was sent over network, be excruciating.

Oracle has developed NCHAR / NVARCHAR2 data types for cases where you are trying to support legacy applications that do not support Unicode in the same database as new applications using Unicode, and for cases where it is useful to store some Unicode data with a different encoding (t i.e. you have a large amount of Japanese data that you prefer to store using the UTF-16 encoding in NVARCHAR2 rather than the UTF-8 encoding). If you are not in one of these two situations, and this is not like you, I would avoid NCHAR / NVARCHAR2 at all costs.

Responding to your next steps

Our application is usually the Oracle database and the data itself. Other software that has a database connection is limited to Toad, Tora or SQL Developer.

What does "care about the data itself" mean? I hope you do not say that you configured the application to bypass Oracle character set conversion programs and that you do all character set conversions yourself.

I also assume that you are using some kind of API / library to access the database, even if it is an OCI. Have you examined what changes need to be made to an application to support NCHAR / NVARCHAR2 and does the API you use support NCHAR / NVARCHAR2? The fact that you are getting Unicode data in C ++ does not really indicate that you will not need to make (potentially significant) changes to support NCHAR / NVARCHAR2 columns.

We also use SQL * Loader and SQL * Plus to communicate with the database for basic applications or update product versions. We have not heard of any specific problem with all this software regarding NVARCHAR2.

All these applications work with NCHAR / NVARCHAR2. NCHAR / NVARCHAR2 introduces some additional complexity into scripts, especially if you are trying to encode string constants that cannot be represented in the database character set. However, you can solve the problems.

We also do not know that the database administrators among our customers would like to use other tools on a database that cannot support data on NVARCHAR2, and we are not really worried if their tools are in the end, they are qualified in their work and can find other tools if necessary.

While I’m sure that your customers can find alternative ways to work with your data if your application does not play well with its corporate reporting tool or its corporate ETL tool or any desktop tools they encounter, it is very likely that the client will blame your application, not their tools. This will probably not be a traffic jam, but there is also no use to unnecessarily sadden customers. This may not force them to use a competitor's product, but it will not force them to seek to cover your product.

Can performance be expected to fail if our application (that is, compiled under Visual C ++) that uses wchar_t to store UTF-16 needs to perform encoding conversions on all processed data?

I'm not sure what kind of "conversions" you are talking about. This may come back to my original question about whether you are declaring that you are bypassing the Oracle NLS layer to convert the character set yourself.

My bottom line, however, is that I don't see any advantages when using NCHAR / NVARCHAR2, given what you are describing. There are many potential drawbacks to using them. Even if you can eliminate 99% of the flaws as not relevant to your specific needs, however, you still encounter a situation where, at best, it is a wash between the two approaches. Given this, I would rather go with an approach that maximizes flexibility in the future and converts the entire database to Unicode (presumably AL32UTF8) and just uses it.

Oracle Text will not work with NVARCHAR2. What else might not be available?

Note after Justin's answer:

More articles: