Regular expression to get characters that appear after 2 or more spaces

I am trying to open a receipt and read the position. Therefore, after I get the position, I wanted to get the price of the goods with the currency symbol.

CHOC. ORANGE   x           £1.00

I tried to break the text with the pound sign, but in some cases, OCR mistakenly transfers the pound to some other characters.

So, is there a way in regex to read characters from the end of the line and stop when it encounters a space of more than 3? Or do I need to write my own algorithm?

I tried to get the last word from the end of the line, but this also fails when it comes across punctuation or space.

\b(\w+)$  
+3
source share
3 answers

Quantifier, . \s{2,}.

, . , | .

, :

import java.util.Currency;
import java.util.Locale;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexReceiptOcr {

    public static void main(String[] args) {
//      String poundSymbol = Currency.getInstance(Locale.UK).getSymbol();
        String poundSymbol = "£";
        String[] inputStrings = {
                "CHOC. ORANGE    x         " + poundSymbol + "1.00"
                , "CHOC. ORANGE    x         L1.00"
        };

        String regex = "(?<description>.+)"
                + "\\s{2,}"                             // two or more white space
                + "(?<currency>"+poundSymbol+"|\\w)"    // Pound symbol may be mis-reaad
                + "(?<amount>\\d+\\.\\d{2})";
        Pattern p = Pattern.compile(regex);
        for (String inputString : inputStrings) {
            Matcher m = p.matcher(inputString);
            if (m.find()) {
                String description  = m.group("description");
                String currency     = m.group("currency");
                String amountString = m.group("amount");

                System.out.format("Desciption: %s%n"
                        + "Currency: %s%n"
                        + "Amount: %s%n"
                        , description.trim()
                        , currency
                        , amountString);
            }
        }
    }

}

:

Desciption: CHOC. ORANGE    x
Currency: £
Amount: 1.00
Desciption: CHOC. ORANGE    x
Currency: L
Amount: 1.00
+1
(£|\$)[0-9]+.[0-9]+

Edit:

String s= "£1.00";
String currency =s.substring(0,1);
String amount=s.substring(1, s.length());
+1

, , .

[£$](\d+(?:\.\d+)?)

, E $, []

. https://regex101.com/r/JzHloV/5

If you want to combine any amount after 2 or more spaces, you can use the following:

 \s{2,}\W+(\d+(?:\.\d+)?)

See https://regex101.com/r/f4gmSu/3 , for example.

It will look for any 2 spaces (or more) than any character, and then the amount and only record the amount.

+1
source

Source: https://habr.com/ru/post/1666118/


All Articles