Regular expression: match string between two slashes if the string itself contains escaped slashes

Question

Regular expression: match string between two slashes if the string itself contains escaped slashes

I am trying to create a regex that matches regular expressions between two slashes. My main problem is that regular expressions themselves can contain slashes that are flipped by a backslash. I am trying to filter them with a negative lookbehind statement (only matching with a closing slash if there is no backlash in the current position), but now I have a problem that I don't get a match if the regular expression itself actually ends with a fluent backslash.

test program:

#!/usr/bin/python import re teststrings=[ """/hello world/""", """/string with foreslash here \/ and here\//""", """/this one ends with backlash\\\\/"""] patt="""^\/(?P<pattern>.*)(?<!\\\\)\/$""" for t in teststrings: m=re.match(patt,t) if m!=None: print t,' => MATCH' else: print t," => NO MATCH"

exit:

 /hello world/ => MATCH /string with foreslash here \/ and here\// => MATCH /this one ends with backlash\\/ => NO MATCH

How can I change the statement so that it only appears if there is one gap, but not two, in the current position?

Or is there a better way to extract a regex? (Note that in the actual file, I am trying to parse lines containing more than just a regular expression. I cannot just look for the first and last slashes in each line and get everything in between.)

+6

python regex

Gryphius Dec 12 '11 at 11:50

source share

1 answer

Tim pietzcker · Accepted Answer · 2011-12-12T11:55:57+0000

Try the following:

 pattern = re.compile(r"^/(?:\\.|[^/\\])*/")

Explanation:

 ^ # Start of string / # Match / (?: # Match either... \\. # an escaped character | # or [^/\\] # any character except slash/backslash )* # any number of times. / # Match /

For your "real world" application (to search for the first "line with slash delimiters", not counting the slashes that are reset), I would use

 pattern = re.compile(r"^(?:\\.|[^/\\])*/((?:\\.|[^/\\])*)/")

This gives you the following:

 >>> pattern.match("foo /bar/ baz").group(1) 'bar' >>> pattern.match("foo /bar\/bam/ baz").group(1) 'bar\\/bam' >>> pattern.match("foo /bar/bam/ baz").group(1) 'bar' >>> pattern.match("foo\/oof /bar\/bam/ baz").group(1) 'bar\\/bam'

Regular expression: match string between two slashes if the string itself contains escaped slashes

More articles: