Match.Value and international characters

UPDATE This post may be useful for encoders using RichTextBoxes. The match is correct for a normal line, I did not see it And I did not see that "ä" is converted to "\ e4r" in richTextBox.Rtf! So Match.Value is right - human error.

RegEx finds the correct text, but Match.Value is wrong because it replaces the German "ä" with "\ e4"!

Let example_text = "Primär-ABC" and let us use the following code

String example_text = "<em>Primär-ABC</em>"; Regex em = new Regex(@"<em>[^<]*</em>" ); Match emMatch = em.Match(example_text); //Works! Match emMatch = em.Match(richtextBox.RTF); //Fails! while (emMatch.Success) { string matchValue = emMatch.Value; Foo(matchValue) ... } 

then emMatch.Value returns "Prim \ 'e4r-ABC" instead of "Primär-ABC".

German ä turns into e4! Since I want to work with an exact string, I will need emMatch.Value to be Primär-ABC - how do I achieve this?

+6
source share
1 answer

In what context are you doing this?

 string example_text = "<em>Ich bin ein Bärliner</em>"; Regex em = new Regex(@"<em>[^<]*</em>" ); Match emMatch = em.Match(example_text); while (emMatch.Success) { Console.WriteLine(emMatch.Value); emMatch = emMatch.NextMatch(); } 

This displays <em>Ich bin ein Bärliner</em> in my console

Probably the problem is not that you are returning the wrong value, but that you are getting a representation of the value that is not displayed correctly. It can depend on many things. Try writing the value to a text file using UTF8 encoding and see if it is fixed.

Edit: Right. The fact is that you get text from WinForms RichTextBox using the Rtf property. This will not return the text as is, but will return an RTF representation of the text. RTF is not ordinary text; it is a markup format for displaying extended text. If you open an RTF document, for example. In notepad you will see that it has a lot of strange codes, including \'e4 for each' ä 'in your RTF document. If you used some markup (e.g. bold text, color, etc.) in the RTF field, the .Rtf property .Rtf also return this code, looking something like {\rtlch\fcs1 \af31507 \ltrch\fcs0 \cf6\insrsid15946317\charrsid15946317 test}

Therefore, use the .Text property. It will return the actual text.

+2
source

Source: https://habr.com/ru/post/921465/


All Articles