Unicode regular expression in string

Question

Unicode regular expression in string

I am working in C # doing some work with OCR and have extracted text that I need to work with. Now I need to parse the string using regular expressions.

string checkNum; string routingNum; string accountNum; Regex regEx = new Regex(@"\u9288\d+\u9288"); Match match = regEx.Match(numbers); if (match.Success) checkNum = match.Value.Remove(0, 1).Remove(match.Value.Length - 1, 1); regEx = new Regex(@"\u9286\d{9}\u9286"); match = regEx.Match(numbers); if(match.Success) routingNum = match.Value.Remove(0, 1).Remove(match.Value.Length - 1, 1); regEx = new Regex(@"\d{10}\u9288"); match = regEx.Match(numbers); if (match.Success) accountNum = match.Value.Remove(match.Value.Length - 1, 1);

The problem is that the string contains the necessary Unicode characters when I do .ToCharArray() and check the contents of the string, but it never recognizes Unicode characters when I parse the string looking for them. I thought the lines in C # were Unicode by default.

+4

c # regex unicode

Marcus king May 14, '10 at 14:56

source share

3 answers

This line:

 match.Value.Remove(0, 1).Remove(match.Value.Length - 1, 1);

throws an exception because the resulting length from the first Remove less than the original match.Value.Length .

I suggest you use groups to extract the value. Example:

 Regex regEx = new Regex(@"\u9288(\d+)\u9288"); Match match = regEx.Match(numbers); if (match.Success) checkNum = match.Groups[1].Value;

With this, I can correctly extract the values.

+1

bruno conde May 14, '10 at 15:21

source share

String in .NET encoding UTF-16 .

In addition, Regex engines do not match Unicode characters, not Unicode codes. See this post .

0

Doug May 14, '10 at 15:08

source share

Marcus king · Accepted Answer · 2010-05-14T16:23:48+0000

I get it. I used decimal values instead of hex code In other words, instead of using \u9288 and \u9286 I had to use \u2448 and \u2446 http://www.ssec.wisc.edu/~tomw/java/unicode.html#x2440

Thanks guys for leading me in the right direction.

Unicode regular expression in string

More articles: