How to determine if text in the clipboard is Windows ISO 8859 or UTF-8 in C ++?

I would like to know if there is an easy way to determine if there is text on the clipboard in ISO 8859 or UTF-8?

Here is my current code:

COleDataObject obj; if (obj.AttachClipboard()) { if (obj.IsDataAvailable(CF_TEXT)) { HGLOBAL hmem = obj.GetGlobalData(CF_TEXT); CMemFile sf((BYTE*) ::GlobalLock(hmem),(UINT) ::GlobalSize(hmem)); CString buffer; LPSTR str = buffer.GetBufferSetLength((int)::GlobalSize(hmem)); sf.Read(str,(UINT) ::GlobalSize(hmem)); ::GlobalUnlock(hmem); //this is my string class s->SetEncoding(ENCODING_8BIT); s->SetString(buffer); } } } 
+4
source share
4 answers

Check out the CF_LOCALE definition on this Microsoft page . It tells you the language of the text on the clipboard. Even better, if you use CF_UNICODETEXT instead, Windows will convert to UTF-16 for you.

+4
source

UTF-8 has a specific structure for bytes without ASCII. You can scan bytes> = 128, and if they are found, check to see if they form the correct UTF-8 string.

Valid UTF-8 byte formats can be found on Wikipedia :

 Unicode Byte1 Byte2 Byte3 Byte4 U+000000-U+00007F 0xxxxxxx U+000080-U+0007FF 110xxxxx 10xxxxxx U+000800-U+00FFFF 1110xxxx 10xxxxxx 10xxxxxx U+010000-U+10FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 

old answer:

You don't need to - all ASCII text is valid UTF-8, so you can simply decode it as UTF-8 and it will work as expected.

To check if it contains non-ASCII characters, you can scan bytes> = 128.

+1
source

I can be wrong, but I think you can’t: if I open the UTF-8 file without Bom in my editor, it will display by default as ISO-8859-1 (my locale), as well as some strange use of strangers (for me) accented characters, I do not have a strong visual hint that this is UTF-8 (unless it is encoded differently elsewhere, for example, the charset declaration in HTML or XML): this is absolutely correct Ansi text.

John wrote "all ASCII text is valid UTF-8," but the opposite is true.

Windows XP + uses native UTF-16 and has a clipboard format for it, but AFAIK simply ignores UTF-8, without much processing. (Well, there is an API for converting UTF-8 to UTF-16 (or Ansi, etc.), Actually).

+1
source

You can check if obj.IsDataAvailable (CF_UNICODETEXT) is there to see if the unicode version of what is on the clipboard is available.

-Adam

0
source

Source: https://habr.com/ru/post/1277428/


All Articles