I am working in C # doing some work with OCR and have extracted text that I need to work with. Now I need to parse the string using regular expressions.
string checkNum; string routingNum; string accountNum; Regex regEx = new Regex(@"\u9288\d+\u9288"); Match match = regEx.Match(numbers); if (match.Success) checkNum = match.Value.Remove(0, 1).Remove(match.Value.Length - 1, 1); regEx = new Regex(@"\u9286\d{9}\u9286"); match = regEx.Match(numbers); if(match.Success) routingNum = match.Value.Remove(0, 1).Remove(match.Value.Length - 1, 1); regEx = new Regex(@"\d{10}\u9288"); match = regEx.Match(numbers); if (match.Success) accountNum = match.Value.Remove(match.Value.Length - 1, 1);
The problem is that the string contains the necessary Unicode characters when I do .ToCharArray() and check the contents of the string, but it never recognizes Unicode characters when I parse the string looking for them. I thought the lines in C # were Unicode by default.
source share