Is there a way to see if a character is using 1 or 2 bytes in Delphi 2009?

Delphi 2009 changed its string type to use 2 bytes to represent a character, which allows it to support a unicode char set. Now when you get sizeof (string), you get the length (String) * sizeof (char). Size (char) is currently 2.

I am wondering if anyone knows about a method that you can know from character to character whether it will correspond to one byte, for example, to find out if char is ascii or Unicode.

I'm most interested in knowing, before my string goes to the database (oracle, Documentum), how many bytes the string will use.

We need to be able to set limits before hand and ideally (since we have a large installed base) without having to change the database. If the string field allows 12 bytes, in delphi 2009 a string of length 7 will always be displayed using 14 bytes, although as soon as it gets into db, it will only use 7 if ascii or 14 if double byte or somewhere in between. if mixture.

+4
source share
7 answers

You can check the meaning of the character:

if ord(c) < 128 then // is an ascii character 
+5
source

First of all, keep in mind that the length of the database can be really in characters, not in bytes - you will need to check the documentation for the data type. I am going to suggest that this is indeed the last for the purpose of the question.

The number of bytes that your string will use depends entirely on the character encoding with which it will be stored. If it is UTF-16, the default string type in Delphi, then it will always be 2 bytes per character, excluding surrogates.

Most likely, encoding, assuming the database uses Unicode encoding, is UTF-8. This is variable-length encoding: depending on the character, characters may require 1 to 4 bytes. You can see a diagram on Wikipedia about how ranges are displayed.

However, if you do not change the database schema at all, this should mean one of three things:

  • Currently, you store everything in binary, not in a textual way (usually this is not a good choice).
  • The database already stores Unicode and counted characters, not bytes (otherwise you would have a problem now, moreover, in the case of letters with an accent)
  • The database is stored in a single-byte code page, such as Windows-1252, not allowing you to store Unicode data at all (which makes it invalid because the characters will be stored the same as before, t use Unicode)

I am not familiar with Oracle, but if you look at MSSQL, they have two different data types: varchar and nvarchar. Varchar is counted in bytes, and nvarchar is counted in characters, so it is suitable for Unicode. MySQL, on the other hand, has only varchar, and it is always counted in characters (starting with 4.1). Therefore, you should check the Oracle documentation and database schema to get a decisive answer to the question whether this is really a problem.

+4
source

If you do not want to use Unicode in Delphi 2009, you can use the AnsiString type. But why do you need it.

A cumbersome but valid test could be:

 function IsAnsi(const AString: string): Boolean; var tempansi : AnsiString; temp : string; begin tempansi := AnsiString(AString); temp := tempansi; Result := temp = AString; end; 
+2
source

You can use the StringElementSize function to find out if a string is Unicode or ANSI. To check if an ANSI character is, use the function of the TCharacter.IsAnsi class in the Character.pas element.

+2
source

You answered that you really want to know how many bytes your string will occupy.

How about converting to UTF8String? Ansi characters will occupy 1 byte. Keep in mind that in UTF-8, Unicode characters can occupy more than 2 bytes.

+1
source

Since with AnsiString 1 char = 1 byte and with Unicode String 1 char = 2 bytes, a simple test to execute is IsAnsiString: = sizeof (aString) = length (aString);

0
source

The ASCII character is always a single byte. You cannot say the same for a Unicode character, as it depends on how it is encoded. You cannot see from a single byte if it is an ASCII or Unicode character or if it is a character at all. So what is your question again? And why do you need to know? I assume that you misunderstood Unicode or I misunderstood your question.

-1
source

Source: https://habr.com/ru/post/1277644/


All Articles