How to determine if text in the clipboard is Windows ISO 8859 or UTF-8 in C ++?

Question

How to determine if text in the clipboard is Windows ISO 8859 or UTF-8 in C ++?

I would like to know if there is an easy way to determine if there is text on the clipboard in ISO 8859 or UTF-8?

Here is my current code:

COleDataObject obj; if (obj.AttachClipboard()) { if (obj.IsDataAvailable(CF_TEXT)) { HGLOBAL hmem = obj.GetGlobalData(CF_TEXT); CMemFile sf((BYTE*) ::GlobalLock(hmem),(UINT) ::GlobalSize(hmem)); CString buffer; LPSTR str = buffer.GetBufferSetLength((int)::GlobalSize(hmem)); sf.Read(str,(UINT) ::GlobalSize(hmem)); ::GlobalUnlock(hmem); //this is my string class s->SetEncoding(ENCODING_8BIT); s->SetString(buffer); } } }

+4

c ++ windows clipboard utf-8

KpexEA Oct 3 '08 at 3:12

source share

4 answers

UTF-8 has a specific structure for bytes without ASCII. You can scan bytes> = 128, and if they are found, check to see if they form the correct UTF-8 string.

Valid UTF-8 byte formats can be found on Wikipedia :

 Unicode Byte1 Byte2 Byte3 Byte4 U+000000-U+00007F 0xxxxxxx U+000080-U+0007FF 110xxxxx 10xxxxxx U+000800-U+00FFFF 1110xxxx 10xxxxxx 10xxxxxx U+010000-U+10FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

old answer:

You don't need to - all ASCII text is valid UTF-8, so you can simply decode it as UTF-8 and it will work as expected.

To check if it contains non-ASCII characters, you can scan bytes> = 128.

+1

John millikin Oct 3 '08 at 3:21

source share

I can be wrong, but I think you can’t: if I open the UTF-8 file without Bom in my editor, it will display by default as ISO-8859-1 (my locale), as well as some strange use of strangers (for me) accented characters, I do not have a strong visual hint that this is UTF-8 (unless it is encoded differently elsewhere, for example, the charset declaration in HTML or XML): this is absolutely correct Ansi text.

John wrote "all ASCII text is valid UTF-8," but the opposite is true.

Windows XP + uses native UTF-16 and has a clipboard format for it, but AFAIK simply ignores UTF-8, without much processing. (Well, there is an API for converting UTF-8 to UTF-16 (or Ansi, etc.), Actually).

+1

Philho Oct 3 '08 at 5:31

source share

You can check if obj.IsDataAvailable (CF_UNICODETEXT) is there to see if the unicode version of what is on the clipboard is available.

-Adam

0

Adam davis Oct 3 '08 at 3:20

source share

Mark ransom · Accepted Answer · 2008-10-03T14:05:59+0000

Check out the CF_LOCALE definition on this Microsoft page . It tells you the language of the text on the clipboard. Even better, if you use CF_UNICODETEXT instead, Windows will convert to UTF-16 for you.

How to determine if text in the clipboard is Windows ISO 8859 or UTF-8 in C ++?

More articles: