Sscanf in Python

I am looking for the equivalent of sscanf() in Python. I want to parse /proc/net/* files, in C, I could do something like this:

 int matches = sscanf( buffer, "%*d: %64[0-9A-Fa-f]:%X %64[0-9A-Fa-f]:%X %*X %*X:%*X %*X:%*X %*X %*d %*d %ld %*512s\n", local_addr, &local_port, rem_addr, &rem_port, &inode); 

At first I thought of using str.split , however it does not break into the given characters, but the whole sep string:

 >>> lines = open("/proc/net/dev").readlines() >>> for l in lines[2:]: >>> cols = l.split(string.whitespace + ":") >>> print len(cols) 1 

Which should return 17, as explained above.

Is there a Python equivalent for sscanf (not RE) or a line-splitting function in a standard library that breaks into any of a range of characters that I don't know about?

+48
python split parsing scanf procfs
Feb 01 '10 at 6:38
source share
11 answers

Python does not have an equivalent built-in sscanf , and most of the time it actually makes sense to parse the input by directly working with the string, using regular expressions or using a parsing tool.

It is probably mainly useful for translating C, people have implemented sscanf , for example, in this module: http://hkn.eecs.berkeley.edu/~dyoo/python/scanf/

In this particular case, if you just want to split the data based on several separator characters, re.split really is the right tool.

+25
Feb 01 '10 at 6:51
source share

When I'm in the mood for C, I usually use the zip and list methods for behavior like scanf. Like this:

 input = '1 3.0 false hello' (a, b, c, d) = [t(s) for t,s in zip((int,float,strtobool,str),input.split())] print (a, b, c, d) 

Note that for more complex format strings, you need to use regular expressions:

 import re input = '1:3.0 false,hello' (a, b, c, d) = [t(s) for t,s in zip((int,float,strtobool,str),re.search('^(\d+):([\d.]+) (\w+),(\w+)$',input).groups())] print (a, b, c, d) 

Please note that you need conversion functions for all types that you want to convert. For example, above I used something like:

 strtobool = lambda s: {'true': True, 'false': False}[s] 
+46
Jun 18 '12 at 14:55
source share

There is also a parse module.

parse() is for the opposite of format() (the newer string formatting function in Python 2.6 and later).

 >>> from parse import parse >>> parse('{} fish', '1') >>> parse('{} fish', '1 fish') <Result ('1',) {}> >>> parse('{} fish', '2 fish') <Result ('2',) {}> >>> parse('{} fish', 'red fish') <Result ('red',) {}> >>> parse('{} fish', 'blue fish') <Result ('blue',) {}> 
+35
Oct 12
source share

You can split into a range of characters using the re module.

 >>> import re >>> r = re.compile('[ \t\n\r:]+') >>> r.split("abc:def ghi") ['abc', 'def', 'ghi'] 
+22
Feb 01 '10 at 6:41
source share

You can analyze the re module using the named groups . It will not parse substrings for their actual data types (e.g. int ), but it is very convenient when analyzing strings.

Given this line of fetch from /proc/net/tcp :

 line=" 0: 00000000:0203 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 335 1 c1674320 300 0 0 0" 

An example mimicking your sscanf example with a variable might be:

 import re hex_digit_pattern = r"[\dA-Fa-f]" pat = r"\d+: " + \ r"(?P<local_addr>HEX+):(?P<local_port>HEX+) " + \ r"(?P<rem_addr>HEX+):(?P<rem_port>HEX+) " + \ r"HEX+ HEX+:HEX+ HEX+:HEX+ HEX+ +\d+ +\d+ " + \ r"(?P<inode>\d+)" pat = pat.replace("HEX", hex_digit_pattern) values = re.search(pat, line).groupdict() import pprint; pprint values # prints: # {'inode': '335', # 'local_addr': '00000000', # 'local_port': '0203', # 'rem_addr': '00000000', # 'rem_port': '0000'} 
+12
Feb 01 '10 at 23:02
source share

There is an ActiveState recipe that implements the basic scanf http://code.activestate.com/recipes/502213-simple-scanf-implementation/

+2
Nov 11 '10 at 17:21
source share

you can rotate ":" into space and make split.eg

 >>> f=open("/proc/net/dev") >>> for line in f: ... line=line.replace(":"," ").split() ... print len(line) 

no need for regular expression (for this case)

+1
Feb 01 '10 at 6:50
source share

Simplified orip answer. I think this is reasonable advice on using the re module. The Kodos application is useful when approaching the complex regexp task with Python.

http://kodos.sourceforge.net/home.html

+1
Aug 23 2018-10-10T00:
source share

Update. The Python documentation for my regex module, re , contains a scanf modeling section, which I found more useful than any of the answers above.

https://docs.python.org/2/library/re.html#simulating-scanf

+1
Dec 19 '16 at 12:38
source share

If the delimiters are ":", you can divide by ":" and then use x.strip () in the lines to get rid of any leading or trailing space. int () will ignore spaces.

0
Feb 01 '10 at
source share

There is a version of Python 2 from odiak .

0
Oct 11
source share



All Articles