Parsing TCLs in Python

Question

Parsing TCLs in Python

I need to split TCL lists, separated by spaces, into double curly braces ... for example ...

OUTPUT = """{{172.25.50.10:01:01-Ethernet 172.25.50.10:01:02-Ethernet {Traffic Item 1}}} {{172.25.50.10:01:02-Ethernet 172.25.50.10:01:01-Ethernet {Traffic Item 1}}}"""

It should be good at ...

 OUTPUT = ["""{{172.25.50.10:01:01-Ethernet 172.25.50.10:01:02-Ethernet {Traffic Item 1}}}""", """{{172.25.50.10:01:02-Ethernet 172.25.50.10:01:01-Ethernet {Traffic Item 1}}}"""]

I tried...

 import re splitter = re.compile('}}\s+{{') splitter.split(OUTPUT)

However, this trims the braces in the center ...

 ['{{172.25.50.10:01:01-Ethernet 172.25.50.10:01:02-Ethernet {Traffic Item 1}', '172.25.50.10:01:02-Ethernet 172.25.50.10:01:01-Ethernet {Traffic Item 1}}}']

I can’t figure out how to separate only spaces between }} {{ . I know that I can spoof and insert missing curly braces manually, but I would rather find a simple way to efficiently parse this.

Is there a way to re.split OUTPUT with re.split (or some other python parsing structure) for an arbitrary number of delimited strings containing {{content here}} ?

+4

python regex parsing tcl

Mike pennington Feb 24 '12 at 10:47

source share

3 answers

Pyparsing improved after a discussion of comp.lang.python, and I think that even Cameron Laird will not complain about the solution using the pyparsing nestedExpr method:

 OUTPUT = """{{172.25.50.10:01:01-Ethernet 172.25.50.10:01:02-Ethernet {Traffic Item 1}}} {{172.25.50.10:01:02-Ethernet 172.25.50.10:01:01-Ethernet {Traffic "Item 1"}}}""" from pyparsing import nestedExpr, originalTextFor nestedBraces1 = nestedExpr('{', '}') for nb in nestedBraces1.searchString(OUTPUT): print nb nestedBraces2 = originalTextFor(nestedExpr('{', '}')) for nb in nestedBraces2.searchString(OUTPUT): print nb

Print

 [[['172.25.50.10:01:01-Ethernet', '172.25.50.10:01:02-Ethernet', ['Traffic', 'Item', '1']]]] [[['172.25.50.10:01:02-Ethernet', '172.25.50.10:01:01-Ethernet', ['Traffic', '"Item 1"']]]] ['{{172.25.50.10:01:01-Ethernet 172.25.50.10:01:02-Ethernet {Traffic Item 1}}}'] ['{{172.25.50.10:01:02-Ethernet 172.25.50.10:01:01-Ethernet {Traffic "Item 1"}}}']

If you need to re-print the data to get the individual elements from the nested curly brackets, then the initial output of the nested list from nestedExpr may be better (note that even if the list contains quotation marks, the specified element is saved as its own element). But if you really want this line containing nested curly braces, use the originalTextFor form shown in nestedBraces2 .

+4

Paulmcg Feb 24 '12 at 23:15

source share

You can use regex to extract, rather than split, the values of list items ...

 re.findall(r'({{.*?}})(?:\Z|\s+)', OUTPUT)

For instance:

 In [30]: regex = re.compile(r'({{.*?}})(?:\Z|\s+)') In [31]: regex.findall(OUTPUT) Out[31]: ['{{172.25.50.10:01:01-Ethernet 172.25.50.10:01:02-Ethernet {Traffic Item 1}}}', '{{172.25.50.10:01:02-Ethernet 172.25.50.10:01:01-Ethernet {Traffic Item 1}}}']

+1

Gandaro Feb 24 '12 at 23:06

source share

Karl Barker · Accepted Answer · 2012-02-24T23:04:41+0000

You can modify your regular expression to use positive lookback / behind expressions that don't consume any of the lines:

 re.compile('(?<=}})\s+(?={{)')

Parsing TCLs in Python

More articles: