Regex to match a floating point number that is not an integer

I have the following regex that I use to find numbers in strings

-?\d*\.?\d+([eE][-+]?\d+)? 

and you want to change it so that it matches only floating-point numbers, not integers. The criteria for this (as far as I can tell) is that at least one of the following must be present in the match:. , e , e . However, I cannot think of a good way to incorporate this requirement into a regular expression without duplicating most of the body.

Duplicate

After a short search, I came across Regular Expressions corresponding to a floating point number, but not an integer , which, although it does not have a clear name, is an exact duplicate of this problem (including soln).

+4
source share
3 answers

The following regex does this, although it's a little cryptic:

 -?(?:\d+())?(?:\.\d*())?(?:e-?\d+())?(?:\2|\1\3) 

Explanation:

There are three parts for a number (integer part, fractional part and exponential part). If the fractional part is present, it is a float , but if it is not, the number can still be a float when the exponential part follows.

This means that you must first make all three parts optional in a regular expression. But then we need to create rules that accurately determine which parts should be there to make a valid float.

Fortunately, there is a trick that allows us to do this. An empty capture group ( () ) always matches (empty string). Backlink to this group ( \1 ) is performed only if the group participated in the match. By inserting () in each of the optional groups, we can later check whether the necessary parts participating in the match participated.

For example, in Python:

 regex = re.compile(r""" -? # Optional minus sign (?: # Start of the first non-capturing group: \d+ # Match a number (integer part) () # Match the empty string, capture in group 1 )? # Make the first non-capturing group optional (?: # Start of the second non-capturing group: \.\d* # Match a dot and an optional fractional part () # Match the empty string, capture in group 2 )? # Make the second non-capturing group optional (?: # Start of the third non-capturing group: e # Match an e or E -? # Match an optional minus sign \d+ # Match a mandatory exponent () # Match the empty string, capture in group 3 )? # Make the third non-capturing group optional (?: # Now make sure that at least the following groups participated: \2 # Either group 2 (containing the empty string) | # or \1\3 # Groups 1 and 3 (because "1" or "e1" alone aren't valid matches) )""", re.I|re.X) 

Test suite:

 >>> [match.group(0) for match in ... regex.finditer("1 1.1 .1 1. 1e1 1.04E-1 -.1 -1. e1 .1e1")] ['1.1', '.1', '1.', '1e1', '1.04E-1', '-.1', '-1.', '.1e1'] 
+5
source

I think I'll just go for

 (-?\d*\.\d+([eE][-+]?\d+)?) | (-?\d+[eE][-+]?\d+) 

The first part is identical to the original expression, but takes a period. The second catches cases without a period, requiring the [eE][-+]?\d+ .

+1
source

Here is my solution using lookahead to resolve '1e1' , but not other values ​​without decimal points:

 >>> pattern = r'[+-]?(?:\d+\.\d*|\.\d+|\d+(?=[eE]))(?:[eE][+-]?\d+)?' >>> re.match(pattern, '4.') <_sre.SRE_Match object at 0x000000000347BD30> >>> re.match(pattern, '4.4') <_sre.SRE_Match object at 0x000000000347BCC8> >>> re.match(pattern, '.4') <_sre.SRE_Match object at 0x000000000347BD30> >>> re.match(pattern, '4e4') <_sre.SRE_Match object at 0x000000000347BCC8> 
+1
source

Source: https://habr.com/ru/post/1499922/


All Articles