Pyparsing OneOrMore is built into OneOrMore's other

Question

Pyparsing OneOrMore is built into OneOrMore's other

I am trying to use pyparsing for the first time. My parser is not doing what I hope it does, can someone please check and see what is wrong. I am trying to insert OneOrMore into OneOrMore, which I think should work fine, but it is not.

below is the whole code:

import pyparsing status = """ sale number : 11/7 NAME ID PAWN PRICE TIME %C STATE START/STOP cross-cu-1 1055 1 106284K 07:49:36.19 25.05% run 1d01h cross-cu-2 918 1 104708K 07:38:19.08 24.02% run 1d01h sale number : 11/8 NAME ID PAWN PRICE TIME %C STATE START/STOP cross-cu-3 1055 1 106284K 07:49:36.19 25.05% run 1d01h cross-cu-4 918 1 104708K 07:38:19.08 24.02% run 1d01h """ integer = pyparsing.Word(pyparsing.nums).setParseAction(lambda toks: int(toks[0])) decimal = pyparsing.Word(pyparsing.nums + ".").setParseAction(lambda toks: float(toks[0])) wordSuppress = pyparsing.Suppress(pyparsing.Word(pyparsing.alphas)) endOfLine = pyparsing.LineEnd().suppress() colon = pyparsing.Suppress(":") saleNumber = pyparsing.Regex("\d{2}\/\d{1}").setResultsName("saleNumber") lineSuppress = pyparsing.Regex("NAME.*STOP") + endOfLine saleRow = wordSuppress + wordSuppress + colon + saleNumber + endOfLine name = pyparsing.Regex("cross-cu-\d").setResultsName("name") id = integer.setResultsName("id") pawn = integer.setResultsName("pawn") price = integer.setResultsName("price") + "K" time = pyparsing.Regex("\d{2}:\d{2}:\d{2}.\d{2}").setResultsName("time") c = decimal.setResultsName("c") + "%" state = pyparsing.Word(pyparsing.alphas).setResultsName("state") startStop = pyparsing.Word(pyparsing.alphanums).setResultsName("startStop") row = name + id + pawn + price + time + c + state + startStop + endOfLine table = pyparsing.OneOrMore(pyparsing.Group(saleRow + lineSuppress.suppress() + (pyparsing.OneOrMore(pyparsing.Group(row) | pyparsing.SkipTo(row).suppress())) ) | pyparsing.SkipTo(saleRow).suppress()) resultDic = [x.asDict() for x in table.parseString(status)] print resultDic

It returns only [{'saleNumber': '11/7'}] I was hoping to get a DIC list:

 [{ {'saleNumber': '11/7'},{ elements in cross-cu-1 line, elements in cross-cu-2 line } }, { {'saleNumber': '11/8'},{ elements in cross-cu-3 line, elements in cross-cu-4 line } }]

Any help is appreciated! Please do not suggest other ways to implement this release! I am also trying to learn pyraming!

+4

python python-2.7 parsing pyparsing

theAlse Sep 14 '12 at 11:48

source share

2 answers

It works?

 import pyparsing status = """ sale number : 11/7 NAME ID PAWN PRICE TIME %C STATE START/STOP cross-cu-1 1055 1 106284K 07:49:36.19 25.05% run 1d01h cross-cu-2 918 1 104708K 07:38:19.08 24.02% run 1d01h sale number : 11/8 NAME ID PAWN PRICE TIME %C STATE START/STOP cross-cu-3 1055 1 106284K 07:49:36.19 25.05% run 1d01h cross-cu-4 918 1 104708K 07:38:19.08 24.02% run 1d01h """ integer = pyparsing.Word(pyparsing.nums).setParseAction(lambda toks: int(toks[0])) decimal = pyparsing.Word(pyparsing.nums + ".").setParseAction(lambda toks: float(toks[0])) wordSuppress = pyparsing.Suppress(pyparsing.Word(pyparsing.alphas)) endOfLine = pyparsing.LineEnd().suppress() colon = pyparsing.Suppress(":") saleNumber = pyparsing.Regex("\d{2}\/\d{1}").setResultsName("saleNumber") lineSuppress = pyparsing.Regex("NAME.*STOP") + endOfLine saleRow = wordSuppress + wordSuppress + colon + saleNumber + endOfLine name = pyparsing.Regex("cross-cu-\d").setResultsName("name") id = integer.setResultsName("id") pawn = integer.setResultsName("pawn") price = integer.setResultsName("price") + "K" time = pyparsing.Regex("\d{2}:\d{2}:\d{2}.\d{2}").setResultsName("time") c = decimal.setResultsName("c") + "%" state = pyparsing.Word(pyparsing.alphas).setResultsName("state") startStop = pyparsing.Word(pyparsing.alphanums).setResultsName("startStop") row = pyparsing.Group(name + id + pawn + price + time + c + state + startStop + endOfLine) row.setResultsName("row") rows = pyparsing.OneOrMore(row).setResultsName("rows") table = pyparsing.OneOrMore(pyparsing.Group(saleRow + lineSuppress + rows)) resultDic = [x.asDict() for x in table.parseString(status)] print resultDic

0

Hans then Sep 14 '12 at 12:26

source share

Hans then · Accepted Answer · 2012-09-14T11:56:30+0000

In this case, the evaporation is likely to be unnecessary. Why don't you just read the file line by line and then analyze the results?

The code will look like this:

EDIT: I updated the code to better follow your example.

from the import defaultdict collection

 status = """ sale number : 11/7 NAME ID PAWN PRICE TIME %C STATE START/STOP cross-cu-1 1055 1 106284K 07:49:36.19 25.05% run 1d01h cross-cu-2 918 1 104708K 07:38:19.08 24.02% run 1d01h sale number : 11/8 NAME ID PAWN PRICE TIME %C STATE START/STOP cross-cu-3 1055 1 106284K 07:49:36.19 25.05% run 1d01h cross-cu-4 918 1 104708K 07:38:19.08 24.02% run 1d01h """ sale_number = '' sales = defaultdict(list) for line in status.split('\n'): line = line.strip() if line.startswith("NAME"): continue elif line.startswith("sale number"): sale_number = line.split(':')[1].strip() elif not line or line.isspace() : continue else: # you can also use a regular expression here sales[sale_number].append(line.split()) for sale in sales: print sale, sales[sale]

Pyparsing OneOrMore is built into OneOrMore's other

More articles: