Parsing items from a text file

Question

Parsing items from a text file

I have a text file containing data inside the {[]} tags. What would be the proposed way to analyze this data so that I can just use the data inside the tags?

An example text file will look like this:

'this is a bunch of text that is not {[really]} useful in any {[path]}. I need {[get]} some elements {[from]} it. ''

I would like to get "really", "path", "get", "from" in the list. I think I could use split to do this .. but it looks like there might be a better way out. I saw a ton of parsing libraries, is there one that would be ideal for what I want to do?

+3

python string text-processing

chris Jun 14 '10 at 19:07

source share

4 answers

Bryan oakley · Answer 1 · 2010-06-14T19:11:49+0000

I would use regular expressions. This answer assumes that none of the characters in the {} [] tag appear in other tags.

import re
text = 'this is a bunch of text that is not {[really]} useful in any {[way]}. I need to {[get]} some items {[from]} it.'

for s in re.findall(r'\{\[(.*?)\]\}', text):
    print s

Using verbal mode in python regular expressions:

re.findall('''
    \{   # opening curly brace
    \[   # followed by an opening square bracket
    (    # capture the next pattern
    .*?  # followed by shortest possible sequence of anything
    )    # end of capture
    \]   # followed by closing square bracket
    \}   # followed by a closing curly brace
    ''', text, re.VERBOSE)

Daniel Roseman · Answer 2 · 2010-06-14T19:12:48+0000

This is a regular expression job:

>>> import re
>>> text = 'this is a bunch of text that is not {[really]} useful in any {[way]}. I need to {[get]} some items {[from]} it.'
>>> re.findall(r'\{\[(\w+)\]\}', text)
['really', 'way', 'get', 'from']

remosu · Answer 3 · 2010-06-15T08:18:07+0000

slower no more regular expressions

old school way: P

def f(s):
    result = []
    tmp = ''
    for c in s:
        if c in '{[':
            stack.append(c)
        elif c in ']}':
            stack.pop()
            if c == ']':
                result.append(tmp)
                tmp = ''
        elif stack and stack[-1] == '[':
            tmp += c
    return result

>>> s
'this is a bunch of text that is not {[really]} useful in any {[way]}. I need to {[get]} some items {[from]} it.'
>>> f(s)
['really', 'way', 'get', 'from']

Henry · Answer 4 · 2010-06-22T03:39:31+0000

Another way

def between_strings(source, start='{[', end=']}'):
    words = []
    while True:
        start_index = source.find(start)
        if start_index == -1:
            break
        end_index = source.find(end)
        words.append(source[start_index+len(start):end_index])
        source = source[end_index+len(end):]
    return words


text = "this is a bunch of text that is not {[really]} useful in any {[way]}. I need to {[get]} some items {[from]} it."
assert between_strings(text) == ['really', 'way', 'get', 'from']

Parsing items from a text file

More articles: