Prevent revert to regex to find lines without comments (not starting with indentation "#")

I would like to look for lines that do not begin with a pound sign (#) indentation.

I am currently using regex ^\s*([^\s#].*) With multi-line option.

My problem is that it works great on uncommented lines.

In commented out lines, the regex engine performs backtracking because of \s* completely from the comment mark to the beginning of the line, which can sometimes lead to 40 or 50 return steps.

The regex works fine with python code. This is simply not very effective due to backtracking caused by the engine.

Any idea on how to avoid it?


Bonus: It's pretty funny that the regex engine does not recognize the fact that it searches [^\s] one by one in \s* and causes this amount of backtracking. What are the problems with the engine?

Bonus 2: Using only the stdlib re module. Since I can not add third parties. (I was technically looking for using sublime text, but want to know how to do this in general in Python)

+5
source share
2 answers

Use the atomic feature of search queries to avoid returning:

 ^(?=(\s*))\1([^#].*) ^^^^^ ^ 

This use is simplified in the negative view that @vks nicely suggests.

or possessive quantifiers when using regex module:

 ^\s*+([^#].*) 

or even atomic groups:

 ^(?>\s*)([^#].*) 

Sublime Text supports all three because it is located on PCRE.

and for the bonus part, no, that’s not funny. If you are more like an eagle eye, you will see that this is not [^\s] , which is literally \S , but it is slightly different: [^\s#] , which for the engine means that at each step it has two look for different paths so that he returns to one.

+4
source

You can just say

 ^(?!\s*#).* 

It takes only 6 steps compared to 33 steps taken by yours .

+4
source

Source: https://habr.com/ru/post/1275163/


All Articles