Extract scientific number from string

I am trying to extract scientific numbers from strings in a text file. Sort of

Example:

str = 'Name of value 1.111E-11 Next Name 444.4' 

Result:

 [1.111E-11, 444.4] 

I tried the solutions in other posts, but it looks like this only works for integers (maybe)

 >>> [int(s) for s in str.split() if s.isdigit()] [] 

float () will work, but I get errors every time a string is used.

 >>> float(str.split()[3]) 1.111E-11 >>> float(str.split()[2]) ValueError: could not convert string to float: value 

Thanks in advance for your help!

+4
source share
3 answers

You can always use a for loop and a try-except .

 >>> string = 'Name of value 1.111E-11 Next Name 444.4' >>> final_list = [] >>> for elem in string.split(): try: final_list.append(float(elem)) except ValueError: pass >>> final_list [1.111e-11, 444.4] 
+3
source

This can be done using regular expressions:

 import re s = 'Name of value 1.111E-11 Next Name 444.4' match_number = re.compile('-?\ *[0-9]+\.?[0-9]*(?:[Ee]\ *-?\ *[0-9]+)?') final_list = [float(x) for x in re.findall(match_number, s)] print final_list 

exit:

 [1.111e-11, 444.4] 

Note that the pattern I wrote above depends on at least one digit to the left of the decimal point.

EDIT:

Here's a tutorial and link I found useful for learning how to write regex patterns.

Since you asked for an explanation of the regex pattern:

 '-?\ *[0-9]+\.?[0-9]*(?:[Ee]\ *-?\ *[0-9]+)?' 

One piece at a time:

 -? optionally matches a negative sign (zero or one negative signs) \ * matches any number of spaces (to allow for formatting variations like - 2.3 or -2.3) [0-9]+ matches one or more digits \.? optionally matches a period (zero or one periods) [0-9]* matches any number of digits, including zero (?: ... ) groups an expression, but without forming a "capturing group" (look it up) [Ee] matches either "e" or "E" \ * matches any number of spaces (to allow for formats like 2.3E5 or 2.3E 5) -? optionally matches a negative sign \ * matches any number of spaces [0-9]+ matches one or more digits ? makes the entire non-capturing group optional (to allow for the presence or absence of the exponent - 3000 or 3E3 

note: \ d is a shortcut for [0-9], but I am using [0-9].

+3
source

I would use Regex:

 import re s = 'Name of value 1.111E-11 Next Name 444.4' print [float(x) for x in re.findall("-?\d+.?\d*(?:[Ee]-\d+)?", s)] 

exit:

 [1.111e-11, 444.4] 
0
source

Source: https://habr.com/ru/post/1496208/


All Articles