Python regular expressions: finding and replacing weirdness

I could use some help with the Python regex problem. You expect result

import re
re.sub("s (.*?) s", "no", "this is a string") 

to be "this is not a string", right? But actually it is "thin". The subfunction uses the entire template as a replaceable group, not just the group that I really want to replace.

All re.sub examples relate to simple word substitution, but what if you want to change something depending on the rest of the line? Like in my example ...

Any help would be greatly appreciated.

Edit:

In my case, tricks with feedback and expectations will not work, since they must be a fixed width. Here is my actual expression:

re.sub(r"<a.*?href=['\"]((?!http).*?)['\"].*?>", 'test', string)

, http, ( , ).

+3
5

s s, , "", "thinotring".

, , , , , backreference. 1 is a. , : \1.

, , , , - :

re.sub(r"(?<=s ).*?(?= s)", "no", "this is a string")

(?<=s ) : , s, .

(?= s), , s .

, lookbehind Python . , , , ... !

re.sub(r"(s ).*?( s)", r"\1no\2", "this is a string")

, , , . , HTML . . SO "regex html", , .

:

re.sub(r"(<a.*?href=['"])((?!http).*?['"].*?>)", r'\1http://\2', string)

. .

+5

(?<=...) (?=...) , :

re.sub("(?<=s )(.*?)(?= s)", "no", "this is a string")

EDIT: this no string, , ...: - (

:

re.sub(r"(?<=href=['\"])((?!http).*?)(?=['\"].*?>)", 'test', string)

href=" ?

+1

, , , re.sub, , .

import re

new_string = re.sub(r"<a.*?href=['\"]((?!http).*?)['\"].*?>", 'test', string)
print new_string

, IDEone.com: http://ideone.com/ufaTw

, , , Beautiful Soup HTML, regex - .

+1

- , , Perl. O: -)

[^ > ] * . *, , . , hrefs ( , ), .

0

, , . :

def absolutize(string, prefix):
    return re.sub(r"(?<=href=['\"])((?!http).*?)(?=['\"])", prefix+r'\1', string)

, Python...: (

-1

Source: https://habr.com/ru/post/1766002/


All Articles