PyParsing: how to use the SkipTo operator and OR (^)

Question

PyParsing: how to use the SkipTo operator and OR (^)

I have different formats for date prefix and other prefixes. I needed to create a grammar that could skip these prefixes and get the necessary data. But, when I use the SkipTo operator and Or(^) , I cannot get the desired results.

 from pyparsing import * import pprint def print_cal(v): print v f=open("test","r") NAND_TIME= Group(SkipTo(Literal("NAND TIMES"),include=True) + Word(nums)+Literal(":").suppress()+Word(nums)).setParseAction(lambda t: print_cal('NAND TIME')) TEST_TIME= Group(SkipTo(Literal("TEST TIMES"),include=True) + Word(nums)+Literal(":").suppress()+Word(nums)).setParseAction(lambda t: print_cal('TEST TIME')) testing =NAND_TIME ^ TEST_TIME watch=OneOrMore(testing) watch.parseString(f.read())

File Contents:

 01 may 2015 15: 15: 100 NAND TIMES 1: 88008888

 01 april 2015 15: 15: 100 NAND TIMES 2: 77777777

 1154544 15: 15: 100 TEST TIMES 1: 78544545

 8787878 aug 2015 15: 15: 100 TEST TIMES 2: 78787878

OUTPUT:

    
 TEST TIME

 TEST TIME

Desired conclusion:

  
 NAND TIME

 NAND TIME

 TEST TIME

 TEST TIME

Can someone help me figure this out?

+6

python python-3.x python-2.7 pyparsing

Praneeth puligundla Aug 08 '14 at 19:41

source share

1 answer

Paulmcg · Accepted Answer · 2014-08-08T21:11:49+0000

Using SkipTo as the first element of the analyzer is bold and may indicate that searchString or scanString will be a better choice than parseString (searchString and scanString allow you to determine only the part of the input that interests you, and the rest will be skipped automatically, but you should make sure that your definition of "what you want" uniquely and not accidentally extracts unwanted bits.) Here is your parser implemented with searchString:

 NAND_TIME= (Literal("NAND TIMES") + Word(nums)+Literal(":").suppress()+Word(nums)).setParseAction(lambda t: print_cal('NAND TIME')) TEST_TIME= (Literal("TEST TIMES") + Word(nums)+Literal(":").suppress()+Word(nums)).setParseAction(lambda t: print_cal('TEST TIME')) testing =NAND_TIME | TEST_TIME testdata = f.read() for match in testing.searchString(testdata): print match.asList()

'|' great for use in this case, as there is no confusion between starting with NAND or starting with TEST.

You may also consider simply parsing this file line by line:

 for line in f: if not line: continue print line print testing.searchString(line).asList() print

PyParsing: how to use the SkipTo operator and OR (^)

More articles: