Regular expression to highlight between two lines (which are variables)

Question

Regular expression to highlight between two lines (which are variables)

I want to use regex to extract text that occurs between two lines. I know how to do this if I want to retrieve the same line every time (and countless questions asking this question, like Matching regular expressions between two lines? ), But I want to do this using variables that change, and may themselves include special characters in Regex. (I want any special characters like * to be treated as text).

For example, if I had:

text = "<b*>Test</b>" left_identifier = "<b*>" right_identifier = "</b>

I need to create regex code that will run the following code:

 re.findall('<b\*>(.*)<\/b>',text)

This is the part <b\*>(.*)<\/b> that I don’t know how to dynamically create.

+6

python python-2.7 regex

kyrenia Apr 15 '15 at 17:10

source share

4 answers

You can do something like this:

 import re pattern_string = re.escape(left_identifier) + "(.*?)" + re.escape(right_identifier) pattern = re.compile(pattern_string)

The escape function will automatically exit special characters. For instance,

 >>> import re >>> print re.escape("<b*>") \<b\*\>

+5

Chirila alexandru Apr 15 '15 at 17:14

source share

A regular expression starts its life as a string, so left_identifier + text + right_identifier and use this in re.compile

Or:

 re.findall('{}(.*){}'.format(left_identifier, right_identifier), text)

works too.

You need to avoid strings in variables if they contain the regex metacharacter with re.escape unless you want the metacharacters to be interpreted as such

 >>> text = "<b*>Test</b>" >>> left_identifier = "<b*>" >>> right_identifier = "</b>" >>> s='{}(.*?){}'.format(*map(re.escape, (left_identifier, right_identifier))) >>> s '\\<b\\*\\>(.*?)\\<\\/b\\>' >>> re.findall(s, text) ['Test']

Str.partition (var) , on the other hand, is an alternative way to do this:

 >>> text.partition(left_identifier)[2].partition(right_identifier)[0] 'Test'

+4

dawg Apr 15 '15 at 17:12

source share

I know that you really need a regular expression solution, but I really wonder if regex is the right tool here, considering that we all took the oath not before . When parsing html strings, I always recommend returning to beautifulsoup

 >>> import bs4 >>> bs4.BeautifulSoup('<b*>Text</b>').text u'Text'

0

Abhijit Apr 15 '15 at 19:07

source share

agf · Accepted Answer · 2015-04-15T17:14:08+0000

You need re.escape identifiers:

 >>> regex = re.compile('{}(.*){}'.format(re.escape('<b*>'), re.escape('</b>'))) >>> regex.findall('<b*>Text</b>') ['Text']

Regular expression to highlight between two lines (which are variables)

More articles: