How to use new line types in Delphi Xe2?

Note that sarnold heavily edited the question; the original question as a whole is contained in the question as a comment. If I did something incomprehensible, perhaps the original post would be helpful. (I am leaving this as a comment, so future editors do not always need to refer to the history of the issue.)

I work with Delphi Xe2 and need help to understand how to use ANSI strings, Unicode strings and wide character strings, correctly, especially when writing a DLL intended for use with other languages ​​(for example, like VB, C ++ or WITH#).

I need to write a DLL using Delphi Xe2 to perform simple string operations on Unicode lines. This DLL should work with one - SimpleShareMem or ShareMem or without memory managers. This DLL must be called due to foreign languages ​​such as VB, C ++ and C #.

By default, strings should now be Unicode strings. Should we use Embarcadero to work with these lines?

Strings: (a) single-byte characters that do not support Unicode or (b) wide strings, where each character requires two bytes. (They do support Unicode, but they are not UTF-8 strings.)

Two types of pointers are available: PAnsiChar and PWideChar (there is no PUnicodeChar pointer PUnicodeChar ). PChar is an alias for PWideChar - does this mean that we always need to allocate 2 * length memory for these lines? (And, similarly, do we need to split memory 2 to get the length of these lines?)

For string constants, do we need to mark the type of string in the source code? For instance:.

 Const MyCo = 'test'; 

or

 Const MyCo = WideString('test'); 

How about doing jobs between string variables?

 s := st; 

If you need to rewrite it:

 s := WideString(st); 

Should we include a Unicode byte sign in our lines? How should we include the specification in our lines?

How should we work with ANSI strings in different Windows code pages? If we get an ANSI string with code page 1200, should we recode the string or work with it as is?

How to use the TEncoding class to convert between Unicode, UTF-8, WideString and AnsiString classes?

Are there serious performance penalties when using wide strings or Unicode strings?

Do we have to write our interfaces to require working only with the WideString options when using the shared memory manager?

Do we have to write our interfaces to require length parameters for PChar , PAnsiChar and PWideChar parameter types?

How to write our interfaces to determine if a file is stored in Unicode, UTF-8, ANSI or Wide Characters? How to determine which format to use when writing files back?

Should we use only procedures? Or can functions work?

Thank you and happy new year.

+4
source share
3 answers

It seems to me that Gu is switching from Delphi 7 to the Unicode-enabled version (D2009 +) and is looking for tips on how to work with newlines.

Cary Jensen white paper Delphi Unicode Migration for Mere Mortals addresses most, if not all, of the issues raised in the question.

I would usually put this in a comment, but the list of comments for so long I felt that the link (which can help more people than just Gu) would be easier to find in the answer.

+5
source
  • ShareMem and SimpleShareMem can be used in conjunction with a DLL using string parameters only among Delphi applications that use the same units. They ensure that the application and the DLL use the same memory manager. You cannot have them outside of Delphi, because VB, C ++ and C # will use their own memory manager. The Delphi line style memory format is compatible with other languages ​​(and vice versa), but lines must be allocated and freed using the same memory manager.
  • Delphi unicode strings are UTF-16 strings because it is the Windows native string type. It can support several 8-bit encodings, including UTF-8, with its AnsiString type.
  • UTF-16 does not always use 2 bytes per character. Some “characters” can use 4 bytes, although they usually only appear if you are performing “exotic” text processing (musical characters, dead languages, etc.).
  • Convert strings to Unicode without loss. Otherwise, you must ensure that each AnsiString uses the correct code page to avoid loss of conversion. Keep in mind that many procedures go through Unicode conversions, so it’s better to convert any string other than Unicode to Unicode and possibly convert back as the final operation (if encoding-specific processing is not required)
  • Since Unicode strings are the default type in both Delphi and Windows, performance should be better because no back and forth conversion is required. User code that works with UTF-16 strings can be slower, although due to more complex processing (UTF-8 can be even more complex though, like some MBCS encodings).
  • The specification is usually used in text files, not for strings in memory. When reading / receiving data from the outside should be converted to its own format in memory. Otherwise, conversion to a native format will be required whenever a string is passed to a function that expects it. The output form depends on your application, you need to have a code that can cancel it or ask the user.
+1
source

AS, you better put this question on the rsdn.ru forum, I think that they are more liberal for beginners than DelphiMasters.

Gu, basically, you should just read the manuals in Delphi XE2. All this is there, just read with thoughts and attention.

1) You must NOT create a DLL. You have to do a BPL. The DLL is designed to work with a simple C interface, such as the Win32 API. These are the most primitive types without any side effects. C is called "machine-independent" assembler for its primitiveness, and the DLL interface is a kind of.

DLLs can be used for compatibility with other languages ​​because they use other side effects incompatible with Delphi. Then the DLL provides the most primitive collaboration, with all the side effects removed from the interface. By lowering the level of interconnects with most simplified types, the DLL provides compatibility.

But when you need to interact with Delphi in Delphi, you better let Delphi take care of all compatibility issues. this is what BPL is made for Delphi 3. There is no reason to use a DLL for this.

Of course, you can shoot in the foot if you like. https://forums.embarcadero.com/thread.jspa?threadID=64114

2) Instead, you should not use WideString in XE2, but UnicodeString. Particular attention is paid to line guides. It sounds like you just haven't read the manuals.

3) String constants are in Unicode. (but the char constants are in ... in ANSI or in Unicode in random order, which leads to errors in operation). And there is no need to ask and trust any words - just open your exe in any viewer and find these constants, you will find them in UCS-2 encoding (aka WideChar) with 2 bytes.

4) The Unicode specification is used to determine the byte order of the processor, whether it is an Intel or Motorola mod. When you are developing for Windows, you can only have Intel byte order, so no specification is needed.

5) the whole paragraph that you wrote about the lengths and sizes of the memory is very ambiguous. What do you mean by length, in what units it is measured, where do you get it from?

I assume that by length you mean the number of characters that do not include any service structures / “behind the hood” characters. This is how the built-in System.Length function (string or array) is returned. However, if this assumption is incorrect, the answer below also becomes incorrect.

And the question, if you have to multiply by 2, is just a sign of bad code. You should always multiply by something, many years ago you should already multiply. Multiply ... what? Using the SizeOf (char -variable) or SizeOf (char) parameter. Then t will be Delphi, which will automatically determine how much memory is needed. And when dealing with a C-string, you should not use length, but length + 1 - do not forget about terminator # 0.

6) How should we work with ANSI strings in different Windows code pages? If we get an ANSI string with code page 1200, do we need to transcode the string or work with it as is?

RTFM !!! Just declare an AnsiString type with code page 1200. or use RawByteStrnig and SetCodePage. Read this code below.

Again, RTFM is what ALL is described in the on-line help. It only takes 2 hours to read it ALL in the built-in help of Delphi XE2.

7) how should we use the TEncoding class to convert between Unicode, UTF-8, WideString and AnsiString classes? TEncoding is for a TStringList or something like that. Why should you? There is a UTF8String type - just use it.

var as: AnsiString; ucs2s: string; utf8string: UTF8String; ... as: = ucs2; utf8s: = as; ....

8) Are there any serious performance penalties using wide strings or Unicode strings? It depends on which Unicode you mean, UCS-2 or UTF8. And what operations do you want to use. Just do a looong loop and measure the time.

9) Do we have to write our interfaces to require length parameters for the parameter types PChar, PAnsiChar and PWideChar? It is your choice, do as you wish. Typically, PChar is a C-line ending with # 0. This is how the StrLen function works. If you ignore this convention and use it as an untypes Pointer, then skip the length separately.

EVERYTHING, this question has already been answered to help !!! Just read it.


 function CDF_File_Buffer.GetStringNoBounds(const ofs, len: integer): string; //  ,    var cntDOS, cntWin, cntWeird, i : Cardinal; sBuf: RawByteString; cp: word; ptr, ptr_i: PAnsiChar; const rusDOS: set of AnsiChar = [#$80..#$AF, #$E0..#$F1]; rusWin: set of AnsiChar = [#$C0..#$FF]; begin ptr := Pointer(Header); Inc(ptr, ofs); case textCharset of tcsGuess: begin ptr_i := ptr; cntDOS :=0; cntWin :=0; cntWeird:=0; for i := 1 to len do begin if ptr_i^ in rusDOS then Inc(cntDOS); if ptr_i^ in rusWin then Inc(cntWin); if (ptr_i^ < #32) or ((ptr_i^ >= #127) and not(ptr_i^ in rusWin) and not(ptr_i^ in rusDOS)) then inc(cntWeird); Inc(ptr_i); end; if (cntWin > cntDOS) or (cntWeird > cntDOS) then cp := 1251 else cp := 866; end; tcsWin: cp := 1251; tcsDOS: cp := 866; else cp := 0; //     end; SetString(sBuf,ptr,len); for i := 1 to Length(sBuf) do if sBuf[i] = #0 then sBuf[i] := #7; //     ,  // 1) ,        // 2)      C++,        // 3)     ,  Windows   C++ SetCodePage(sBuf, cp, false); Result := string(sBuf); end; function CDF_File_Buffer.GetString(const ofs, len: integer; const min, max: integer): string; begin if (ofs <= 0) or (len <= 0) then Exit( '---   ---'); if Cardinal(ofs + (len-1)) >= TotalSize then //    ,  Warning Exit('---    ---'); Result := '---    :    '; if ofs < min then Exit(Result + ' ---'); if ofs + (len-1) > max then Exit(Result + ' ---'); Result := GetStringNoBounds(ofs, len); end; function CDF_File_Buffer.GetString(const ofs, len: integer): string; begin Result := GetString(ofs, len, 0, TotalSize-1); end; 
0
source

Source: https://habr.com/ru/post/1388770/


All Articles