Regular expression - find lottery numbers

I was hoping for some kind of regex guide if possible, as I trash on them :(

I looked at the lottery ticket in the text and I am trying to get the lottery numbers from the returned text.

Here is the string returned:

"if * it • Including Millionaire Raffle 7618-011874089-204279 111111111111111111111111111111 Goad luck for your draw on Fri 09 Nov 12 Your numbers Lucky Stars A 1 8 22 37 47 48 - 03 10 B11 15 26 43 44 - 05 06 C 08 23 27 28 29 - 02 09 D06 09 21 26 29 - 01 05 E 06 07 21 22 45 - 04 05 Your raffle numbers) for your draw(s) PRC690104 PRC690105 PRC690106 PRC690107 1DRC690108 CHECK YOUR MILLIONAIRE RAFFLE RESULTS ONLINE AT WWW.NATIONAL-LOTTERY.CO.UK 5 plays x f2.00 for 1 draw = f10.00 HUGE EUROMILLIONS JACKPOTS TO PLAY FOR EVERY TUESDAY AND FRIDAY! PLAY TODAY FOR THE CHANCE TO WIN YOUR WILDEST DREAMS! 7618-011874089-204279 035469 Term. 26048301 Fill the box to void the ticket 11111111111111111111111 1111111111111111111111111" 

This is a scanned image:

The ticket that was scanned

As you can see, lottery numbers always appear between "Lucky Stars" and "Your Raffle"

Can anyone suggest how to cut the results to get "A18223747480310", "B11152643440506", "C08232728290209", "D06092126290105", "E06072122450405", please?

Any help would be greatly appreciated!

+4
source share
4 answers

The combination of Regex and string.Split will be simpler and more efficient:

 Regex reg = new Regex("(?s)(?<=Lucky Stars).+?(?=Your raffle numbers)"); string[] yourNumbers = Regex.Replace(reg.Match("inputString").Value,"[ -]", "") .Split(new char[]{'\n'}, StringSplitOptions.RemoveEmptyEntries); 
+1
source

Try to make things simple: each lottery number consists of one of the letters A - E , followed by exactly 14 digits, each of which may have several spaces and / or hyphen (-) characters between them.

So, here is a regular expression to extract each lottery number:

 [AE]([\s-]*\d){14} 

Visualization: (from the Debuggex demo )

Regular expression visualization

Then get the desired results, replacing all spaces and dashes with blank lines.

+1
source

Since the results have a beginning of 0 (for example, 08 for 8), a simple way will be split into every 2 digits. No regular expression required.

0
source

This pair of regular expressions should work for the case you showed us.

 /// <summary> /// Regular expression built for C# on: Sun, Aug 25, 2013, 12:55:52 PM /// Using Expresso Version: 3.0.4334, http://www.ultrapico.com /// /// A description of the regular expression: /// /// Match expression but don't capture it. [Lucky Stars\r\n] /// Lucky Stars\r\n /// Lucky /// Space /// Stars /// Carriage return /// New line /// [Numbers]: A named capture group. [.*\r\n], exactly 5 repetitions /// .*\r\n /// Any character, any number of repetitions /// Carriage return /// New line /// /// /// </summary> public static Regex regex = new Regex( "(?:Lucky Stars\\r\\n)(?<Numbers>.*\\r\\n){5}", RegexOptions.CultureInvariant | RegexOptions.Compiled ); public static Regex replaceRegex = new Regex( "(\\s-.*\r\n)", RegexOptions.CultureInvariant | RegexOptions.Compiled ); 

And the code for finding numbers can be as follows:

 var InputText = @"Lucky Stars A 1 8 22 37 47 48 - 03 10 B11 15 26 43 44 - 05 06 C 08 23 27 28 29 - 02 09 D06 09 21 26 29 - 01 05 E 06 07 21 22 45 - 04 05 Your raffle numbers"; Match m = regex.Match(InputText); var numbers = m.Groups["Numbers"].Captures .OfType<Capture>() .Select(c => replaceRegex.Replace(c.Value, "").Replace(" ", "")); 

But I doubt that using regex is the best solution when you use the OCR technique to get text from an image.

0
source

Source: https://habr.com/ru/post/1498810/


All Articles