Atoi () with other languages

I am working on an internationalization project. Do other languages, such as Arabic or Chinese, use different representations for numbers other than 0-9? If so, are there versions of atoi () that will take these other views into account?

I should add that I'm mostly interested in analyzing user input. If users enter in a different view, I want to be sure that I recognize it as a number and treat it accordingly.

+6
source share
2 answers

I can use std::wistringstream and locale to create this integer.

 #include <sstream> #include <locale> using namespace std; int main() { locale mylocale("en-EN"); // Construct locale object with the user default preferences wistringstream wss(L"1"); // your number string wss.imbue( mylocale ); // Imbue that locale int target_int = 0; wss >> target_int; return 0; } 

Additional information about the stream class and the locale class .

+6
source

If you are concerned about international characters, you need to make sure that you are using a "Unicode-aware" function, such as _wtoi (..).

You can also check if UNICODE is supported to make it type independent (from MSDN ):

 TCHAR tstr[4] = TEXT("137"); #ifdef UNICODE size_t cCharsConverted; CHAR strTmp[SIZE]; // SIZE equals (2*(sizeof(tstr)+1)). This ensures enough // room for the multibyte characters if they are two // bytes long and a terminating null character. See Security // Alert below. wcstombs_s(&cCharsConverted, strTmp, sizeof(strTmp), (const wchar_t *)tstr, sizeof(strTmp)); num = atoi(strTmp); #else int num = atoi(tstr); #endif 

In this example, the standard C library function wcstombs translates Unicode to ASCII. An example relies on the fact that numbers from 0 to 9 can always be converted from Unicode to ASCII, even if some of the surrounding text cannot. The atoi function stops at any character that is not a digit.

Your application may use the Language Support (NLS) LCMapString function for processing text, which includes the native digits provided for some of the Unicode scripts.

Caution Using the wcstombs function incorrectly may security your application. To do this, make sure that the application buffer for a string of 8-bit characters is at the smallest size 2 * (char_length +1), where char_length represents the length of the Unicode string. This restriction is made because with double-byte character sets (DBCS), each Unicode character can be mapped to two consecutive 8-bit characters. If the buffer does not contain the entire string, the result string does not end with a null value, creating a security risk. For more information about application security, see Security Considerations: International Features.

+2
source

Source: https://habr.com/ru/post/891011/


All Articles