Strict string in byte encoding in C #

I just stumbled upon another question in which someone suggested using new ASCIIEncoding().GetBytes(someString)to convert from string to bytes. It was obvious to me that it should not work for non-ASCII characters. But, as it turned out, ASCIIEncoding happily replaces invalid characters with "?". I am very confused by this, because this view violates the rule of least surprise. In Python, this will be u"some unicode string".encode("ascii"), and the conversion is strictly by default, so non-ASCII characters will throw an exception in this example.

Two questions:

  • How can strings be strictly converted to another encoding (for example, ASCII or Windows-1252) so that an exception is thrown when invalid characters occur? By the way, I don’t want the foreach loop to convert every Unicode number to byte, and then check the 8th bit. This is supposed to be done using a large structure such as .NET (or Python ^^).
  • Any ideas on justifying this default behavior? It makes sense for me to do strict default transformations, or at least define a parameter for this purpose (Python allows you to "replace", "ignore", "strictly").
+3
source share
1 answer

.Net , . EncoderExceptionFallback ( EncoderFallbackException, ), . :

Encoding ae = Encoding.GetEncoding(
              "us-ascii",
              new EncoderExceptionFallback(), 
              new DecoderExceptionFallback());

:

// The input string consists of the Unicode characters LEFT POINTING 
// DOUBLE ANGLE QUOTATION MARK (U+00AB), 'X' (U+0058), and RIGHT POINTING 
// DOUBLE ANGLE QUOTATION MARK (U+00BB). 
// The encoding can only encode characters in the US-ASCII range of U+0000 
// through U+007F. Consequently, the characters bracketing the 'X' character
// cause an exception.

string inputString = "\u00abX\u00bb";
byte[] encodedBytes = new byte[ae.GetMaxByteCount(inputString.Length)];
int numberOfEncodedBytes = 0;
try
{
    numberOfEncodedBytes = ae.GetBytes(inputString, 0, inputString.Length, 
                                       encodedBytes, 0);
}
catch (EncoderFallbackException e)
{
    Console.WriteLine("bad conversion");
}

MSDN, " .NET Framework" , - . , , . .

+7

Source: https://habr.com/ru/post/1767961/


All Articles