Efficient way to do a lot of searches / measurements in Python?

Question

Efficient way to do a lot of searches / measurements in Python?

I am new to Python and I am writing a series from a script to convert between some proprietary markup formats. I repeat line after line above the files, and then basically do a large number of (100-200) replacements, which are mainly divided into 4 categories:

line = line.replace("-","<EMDASH>")  # Replace single character with tag
line = line.replace("<\\@>","@")     # tag with single character
line = line.replace("<\\n>","")      # remove tag
line = line.replace("\xe1","&bull;") # replace non-ascii character with entity

The str.replace () function seems pretty efficient (pretty low in numbers when parsing profiling output), but is there a better way to do this? I saw the re.sub () method with the function as an argument, but not sure if this would be better? I think it depends on what kind of optimizations Python does internally. Thought I would ask for advice before creating a big dict that might not be very useful!

Also, I am a little versed in tags (which look like HTML but not HTML). I identify the tags as follows:

m = re.findall('(<[^>]+>)',line)

And then do ~ 100 search / replace (basically removing matches) in matching tags, for example:

m = re.findall('(<[^>]+>)',line)
for tag in m:
    tag_new = re.sub("\*t\([^\)]*\)","",tag)
    tag_new = re.sub("\*p\([^\)]*\)","",tag_new)

    # do many more searches...

if tag != tag_new:
    line = line.replace(tag,tag_new,1) # potentially problematic

Any thoughts on efficiency here?

Thank!

+3

python regex

Raolin Jan 11 '11 at 18:12

source share

4 answers

, string.replace() . / , .

, , cStringIO

0

Falmarri 11 . '11 18:16

, ( , , ), .

replace() , . - ...

[<normal text>, <tag>, <tag>, <normal text>, <tag>, <normal text>]
# from an original "<normal text><tag><tag><normal text><tag><normal text>"

... , ( ''.join() ).

0

Amber 11 . '11 18:26

You can pass an object to a function re.subinstead of a lookup string, it takes a match object and returns a lookup, for example,

>>> r = re.compile(r'<(\w+)>|(-)')
>>> r.sub(lambda m: '(%s)' % (m.group(1) if m.group(1) else 'emdash'), '<atag>-<anothertag>')
'(atag)(emdash)(anothertag)'

Of course, you can use a more complex function object, this lambda is just an example.

Using one regular expression that does all the substitution should be a little faster than repeating a line many times, but if many substitutions are performed, the service calls of the function object that evaluates the substitution can be significant.

0

Giuseppe ottaviano Jan 11 '11 at 18:34

source share

Rafe Kettler · Accepted Answer · 2011-01-11T18:16:25+0000

str.replace()more effective if you intend to perform a basic search and replace, and re.sub(obviously) more effective if you require complex pattern matching (because otherwise you will need to use it str.replaceseveral times).

. , , re.sub. , , str.replace.

( re.sub , ). , , , .

Efficient way to do a lot of searches / measurements in Python?

More articles: