A single number from one per line in Python

Question

A single number from one per line in Python

I have strings containing numbers with their units, for example. 2 GB, 17 feet, etc. I would like to separate the number from the block and create two different lines. Sometimes there is a space between them (for example, 2 GB), and this is easy to do with split ('').

When they are together (for example, 2 GB), I will test each character until I find a letter instead of a number.

s='17GB' number='' unit='' for c in s: if c.isdigit(): number+=c else: unit+=c

Is there a better way to do this?

thanks

+4

python string units-of-measurement

duduklein Feb 10 '10 at 20:59

source share

12 answers

You can exit the loop when you find the first non-digital character

 for i,c in enumerate(s): if not c.isdigit(): break number = s[:i] unit = s[i:].lstrip()

If you have negative and decimal numbers:

 numeric = '0123456789-.' for i,c in enumerate(s): if c not in numeric: break number = s[:i] unit = s[i:].lstrip()

+8

pwdyson Feb 10 '10 at 21:26

source share

You can use regex to split a string into groups:

 >>> import re >>> p = re.compile('(\d+)\s*(\w+)') >>> p.match('2GB').groups() ('2', 'GB') >>> p.match('17 ft').groups() ('17', 'ft')

+5

Jarret hardie Feb 10 '10 at 21:03

source share

tokenize can help:

 >>> import StringIO >>> s = StringIO.StringIO('27GB') >>> for token in tokenize.generate_tokens(s.readline): ... print token ... (2, '27', (1, 0), (1, 2), '27GB') (1, 'GB', (1, 2), (1, 4), '27GB') (0, '', (2, 0), (2, 0), '')

+3

Ignacio Vazquez-Abrams Feb 10 '10 at 21:04

source share

 >>> s="17GB" >>> ind=map(str.isalpha,s).index(True) >>> num,suffix=s[:ind],s[ind:] >>> print num+":"+suffix 17:GB

+2

ghostdog74 Feb 11 '10 at 1:39

source share

You should use regular expressions, grouping what you want to know:

 import re s = "17GB" match = re.match(r"^([1-9][0-9]*)\s*(GB|MB|KB|B)$", s) if match: print "Number: %d, unit: %s" % (int(match.group(1)), match.group(2))

Change the regular expression according to what you want to parse. If you are not familiar with regular expressions, this is a great site for tutorials.

+1

Andidog Feb 10 '10 at 21:03

source share

How about using regex

http://python.org/doc/1.6/lib/module-regsub.html

0

Ole media Feb 10 '10 at 21:03

source share

For this task, I definitely use a regex:

 import re there = re.compile(r'\s*(\d+)\s*(\S+)') thematch = there.match(s) if thematch: number, unit = thematch.groups() else: raise ValueError('String %r not in the expected format' % s)

In the RE pattern, \s means spaces, \d means number, \s means non-spaces; * means "0 or more of the preceding", + means "1 or more of the preceding", and parentheses enclose "capture groups", which are then returned by calling groups() on the match object. thematch - None if the given string does not match the pattern: optional spaces, then one or more numbers, then optional spaces, then one or more non-white characters).

0

Alex martelli Feb 10 '10 at 21:07

source share

Regular expression.

 import re m = re.match(r'\s*(?P<n>[-+]?[.0-9])\s*(?P<u>.*)', s) if m is None: raise ValueError("not a number with units") number = m.group("n") unit = m.group("u")

This will give you a number (integer or fixed point), too complex to remove the scientific notation “e” from the device prefix) with an optional sign followed by units, with an optional space.

You can use re.compile() if you are going to make many matches.

0

Mike de simon Feb 10 '10 at 21:08

source share

This uses an approach that should be a little more forgiving than regular expressions. Note: this is not as effective as other solutions.

 def split_units(value): """ >>> split_units("2GB") (2.0, 'GB') >>> split_units("17 ft") (17.0, 'ft') >>> split_units(" 3.4e-27 frobnitzem ") (3.4e-27, 'frobnitzem') >>> split_units("9001") (9001.0, '') >>> split_units("spam sandwhiches") (0, 'spam sandwhiches') >>> split_units("") (0, '') """ units = "" number = 0 while value: try: number = float(value) break except ValueError: units = value[-1:] + units value = value[:-1] return number, units.strip()

0

Logan evans May 06 '15 at 20:39

source share

SCIENTIFIC NOTATION This regular expression works well for me to parse numbers that might be in scientific notation, and is based on recent python documentation about scanf: https://docs.python.org/3/library/re.html#simulating -scanf

 units_pattern = re.compile("([-+]?(\d+(\.\d*)?|\.\d+)([eE][-+]?\d+)?|\s*[a-zA-Z]+\s*$)") number_with_units = list(match.group(0) for match in units_pattern.finditer("+2.0e-1 mm")) print(number_with_units) >>>['+2.0e-1', ' mm'] n, u = number_with_units print(float(n), u.strip()) >>>0.2 mm

0

Vince W. Feb 07 '17 at 15:32

source share

try the regex pattern below. the first group (scanf () markers for any number anyway) is removed directly from the python documents for the re module.

 import re SCANF_MEASUREMENT = re.compile( r'''( # group match like scanf() token %e, %E, %f, %g [-+]? # +/- or nothing for positive (\d+(\.\d*)?|\.\d+) # match numbers: 1, 1., 1.1, .1 ([eE][-+]?\d+)? # scientific notation: e(+/-)2 (*10^2) ) (\s*) # separator: white space or nothing ( # unit of measure: like GB. also works for no units \S*)''', re.VERBOSE) ''' :var SCANF_MEASUREMENT: regular expression object that will match a measurement **measurement** is the value of a quantity of something. most complicated example:: -666.6e-100 units ''' def parse_measurement(value_sep_units): measurement = re.match(SCANF_MEASUREMENT, value_sep_units) try: value = float(measurement[0]) except ValueError: print 'doesn't start with a number', value_sep_units units = measurement[5] return value, units

0

steodatus Feb 26 '17 at 10:37

source share

John la rooy · Accepted Answer · 2010-02-10T21:10:43+0000

 s='17GB' for i,c in enumerate(s): if not c.isdigit(): break number=int(s[:i]) unit=s[i:]

A single number from one per line in Python

More articles: