A single number from one per line in Python

I have strings containing numbers with their units, for example. 2 GB, 17 feet, etc. I would like to separate the number from the block and create two different lines. Sometimes there is a space between them (for example, 2 GB), and this is easy to do with split ('').

When they are together (for example, 2 GB), I will test each character until I find a letter instead of a number.

s='17GB' number='' unit='' for c in s: if c.isdigit(): number+=c else: unit+=c 

Is there a better way to do this?

thanks

+4
source share
12 answers
 s='17GB' for i,c in enumerate(s): if not c.isdigit(): break number=int(s[:i]) unit=s[i:] 
+2
source

You can exit the loop when you find the first non-digital character

 for i,c in enumerate(s): if not c.isdigit(): break number = s[:i] unit = s[i:].lstrip() 

If you have negative and decimal numbers:

 numeric = '0123456789-.' for i,c in enumerate(s): if c not in numeric: break number = s[:i] unit = s[i:].lstrip() 
+8
source

You can use regex to split a string into groups:

 >>> import re >>> p = re.compile('(\d+)\s*(\w+)') >>> p.match('2GB').groups() ('2', 'GB') >>> p.match('17 ft').groups() ('17', 'ft') 
+5
source

tokenize can help:

 >>> import StringIO >>> s = StringIO.StringIO('27GB') >>> for token in tokenize.generate_tokens(s.readline): ... print token ... (2, '27', (1, 0), (1, 2), '27GB') (1, 'GB', (1, 2), (1, 4), '27GB') (0, '', (2, 0), (2, 0), '') 
+3
source
 >>> s="17GB" >>> ind=map(str.isalpha,s).index(True) >>> num,suffix=s[:ind],s[ind:] >>> print num+":"+suffix 17:GB 
+2
source

You should use regular expressions, grouping what you want to know:

 import re s = "17GB" match = re.match(r"^([1-9][0-9]*)\s*(GB|MB|KB|B)$", s) if match: print "Number: %d, unit: %s" % (int(match.group(1)), match.group(2)) 

Change the regular expression according to what you want to parse. If you are not familiar with regular expressions, this is a great site for tutorials.

+1
source

For this task, I definitely use a regex:

 import re there = re.compile(r'\s*(\d+)\s*(\S+)') thematch = there.match(s) if thematch: number, unit = thematch.groups() else: raise ValueError('String %r not in the expected format' % s) 

In the RE pattern, \s means spaces, \d means number, \s means non-spaces; * means "0 or more of the preceding", + means "1 or more of the preceding", and parentheses enclose "capture groups", which are then returned by calling groups() on the match object. thematch - None if the given string does not match the pattern: optional spaces, then one or more numbers, then optional spaces, then one or more non-white characters).

0
source

Regular expression.

 import re m = re.match(r'\s*(?P<n>[-+]?[.0-9])\s*(?P<u>.*)', s) if m is None: raise ValueError("not a number with units") number = m.group("n") unit = m.group("u") 

This will give you a number (integer or fixed point), too complex to remove the scientific notation ā€œeā€ from the device prefix) with an optional sign followed by units, with an optional space.

You can use re.compile() if you are going to make many matches.

0
source

This uses an approach that should be a little more forgiving than regular expressions. Note: this is not as effective as other solutions.

 def split_units(value): """ >>> split_units("2GB") (2.0, 'GB') >>> split_units("17 ft") (17.0, 'ft') >>> split_units(" 3.4e-27 frobnitzem ") (3.4e-27, 'frobnitzem') >>> split_units("9001") (9001.0, '') >>> split_units("spam sandwhiches") (0, 'spam sandwhiches') >>> split_units("") (0, '') """ units = "" number = 0 while value: try: number = float(value) break except ValueError: units = value[-1:] + units value = value[:-1] return number, units.strip() 
0
source

SCIENTIFIC NOTATION This regular expression works well for me to parse numbers that might be in scientific notation, and is based on recent python documentation about scanf: https://docs.python.org/3/library/re.html#simulating -scanf

 units_pattern = re.compile("([-+]?(\d+(\.\d*)?|\.\d+)([eE][-+]?\d+)?|\s*[a-zA-Z]+\s*$)") number_with_units = list(match.group(0) for match in units_pattern.finditer("+2.0e-1 mm")) print(number_with_units) >>>['+2.0e-1', ' mm'] n, u = number_with_units print(float(n), u.strip()) >>>0.2 mm 
0
source

try the regex pattern below. the first group (scanf () markers for any number anyway) is removed directly from the python documents for the re module.

 import re SCANF_MEASUREMENT = re.compile( r'''( # group match like scanf() token %e, %E, %f, %g [-+]? # +/- or nothing for positive (\d+(\.\d*)?|\.\d+) # match numbers: 1, 1., 1.1, .1 ([eE][-+]?\d+)? # scientific notation: e(+/-)2 (*10^2) ) (\s*) # separator: white space or nothing ( # unit of measure: like GB. also works for no units \S*)''', re.VERBOSE) ''' :var SCANF_MEASUREMENT: regular expression object that will match a measurement **measurement** is the value of a quantity of something. most complicated example:: -666.6e-100 units ''' def parse_measurement(value_sep_units): measurement = re.match(SCANF_MEASUREMENT, value_sep_units) try: value = float(measurement[0]) except ValueError: print 'doesn't start with a number', value_sep_units units = measurement[5] return value, units 
0
source

Source: https://habr.com/ru/post/1300919/


All Articles