How to match text format with string without regular expression in python?

Question

How to match text format with string without regular expression in python?

I am reading a file with lines in the form

[ 0 ] L= 9 (D) R= 14 (D) p= 0.0347222 e= 10 n= 34

I looked at the matlab code to read this file specified

 [I,L,Ls,R,Rs,p,e,n] = textread(f1,'[ %u ] L= %u%s R= %u%sp= %ne=%un=%u')

I want to read this file in Python. The only thing I know about regex, and reading even part of this line leads to something like

 re.compile('\s*\[\s*(?P<id>\d+)\s*\]\s*L\s*=\s*(?P<Lint>\d+)\s*\((?P<Ltype>[DG])\)\s*R\s*=\s*(?P<Rint>\d+)\s*')

what is ugly! Is there an easier way to do this in Python?

+4

python regex matlab readfile

highBandWidth Apr 14 '11 at 19:55

source share

4 answers

You can make regexp more readable by building it with escape / replace ...

 number = "([-+0-9.DdEe ]+)" unit = r"\(([^)]+)\)" t = "[X] L=XU R=XU p=X e=X n=X" m = re.compile(re.escape(t).replace("X", number).replace("U", unit))

+3

6502 Apr 14 '11 at 20:16

source share

This looks more or less pythonic to me:

 line = "[ 0 ] L= 9 (D) R= 14 (D) p= 0.0347222 e= 10 n= 34" parts = (None, int, None, None, int, str, None, int, str, None, float, None, int, None, int) [I,L,Ls,R,Rs,p,e,n] = [f(x) for f, x in zip(parts, line.split()) if f is not None] print [I,L,Ls,R,Rs,p,e,n]

+2

abbot Apr 14 '11 at 20:25

source share

Python has no scanf equivalent as indicated on the re page for Python .

Python currently does not have the equivalent of scanf (). Regular expressions are generally more powerful, although more verbose than scanf () format strings. The table below shows more equivalent mappings between scanf () format characters and regular expressions.

However, you could create your own module like scanf using the mappings on this page.

+1

Andrew White Apr 14 '11 at 20:00

source share

Paulmcg · Accepted Answer · 2011-04-15T00:08:38+0000

Pyparsing is a rollback from unreadable and fragile regex processors. The parsing example below handles your declared format, as well as all sorts of extra spaces and arbitrary order of assignment expressions. Just as you used named groups in your regular expression, pyparsing supports result names so that you can access the parsed data using the dict syntax or attribute (data ['Lint'] or data.Lint).

 from pyparsing import Suppress, Word, nums, oneOf, Regex, ZeroOrMore, Optional # define basic punctuation EQ,LPAR,RPAR,LBRACK,RBRACK = map(Suppress,"=()[]") # numeric values integer = Word(nums).setParseAction(lambda t : int(t[0])) real = Regex(r"[+-]?\d+\.\d*").setParseAction(lambda t : float(t[0])) # id and assignment fields idRef = LBRACK + integer("id") + RBRACK typesep = LPAR + oneOf("DG") + RPAR lExpr = 'L' + EQ + integer("Lint") rExpr = 'R' + EQ + integer("Rint") pExpr = 'p' + EQ + real("pFloat") eExpr = 'e' + EQ + integer("Eint") nExpr = 'n' + EQ + integer("Nint") # accept assignments in any order, with or without leading (D) or (G) assignment = lExpr | rExpr | pExpr | eExpr | nExpr line = idRef + lExpr + ZeroOrMore(Optional(typesep) + assignment) # test the parser text = "[ 0 ] L= 9 (D) R= 14 (D) p= 0.0347222 e= 10 n= 34" data = line.parseString(text) print data.dump() # prints # [0, 'L', 9, 'D', 'R', 14, 'D', 'p', 0.034722200000000002, 'e', 10, 'n', 34] # - Eint: 10 # - Lint: 9 # - Nint: 34 # - Rint: 14 # - id: 0 # - pFloat: 0.0347222

In addition, the string-> int or string-> float conversion is performed during parsing in the parsing actions, so that later the values are already in a useful form. (The idea in pyparsing is that when analyzing these expressions you know that a word consisting of numeric digits - or Word(nums) - will safely convert to int, so why not do the conversion right then instead of just return to string matching and the need to reprogram the string sequence, trying to determine which ones are integers, floats, etc.?)

How to match text format with string without regular expression in python?

More articles: