Regex word boundaries and match distance

I would like to be able to use a regular expression to find matches for a specific keyword phrase within some text.

The key phrase may or may not contain 1 or more spaces (usually it will be only one word, but in some cases there may be several words).

Currently, I am using the following expression, where the key phrase is a single word (without spaces):

var regexPattern = string.Format( "\\b({0})\\b", keyphrase );

When a keyword phrase is multiple words (contains one or more spaces), I then update the expression to replace any of these spaces with a wildcard:

regexPattern = regexPattern.Replace( " ", ".*" );

There are several scenarios in which this does not behave as I need.

1) If the key phrase in my long text (which I am looking for matches) is surrounded by either an underscore or a number, it no longer matches. This is great with hyphens, commas, full stops, etc. In these scenarios, it still detects a passphrase, but I also need it to match when the passphrase is surrounded by underscores or numbers.

2) In a scenario where my keyword phrase consists of several words (contains 1 or more spaces), I would like to allow up to a certain maximum distance / length between each of the words that form my keyword phrase.

eg. If my key phrase is:

for sale

... and the text I'm matching

I have a bike for    sale.

... (where there is a maximum distance of 5 characters between key phrases), I would like the regular expression to match:

bike for    sale

, , 5 , , .

, "" , , :

I have a bike for _.,1sale.

, , , , , , , , :

.

I have a bike for _.,1sale. I've also got a laptop for sale!

, , , 2 , , , , .

+4
1

, 2 :

var regexPattern = string.Format( "(?<!\\p{{L}}){0}(?!\\p{{L}})", keyphrase );
// or
// var regexPattern = string.Format( "(?<=\\P{{L}}|^){0}(?=\\P{{L}}|$)", keyphrase );
regexPattern = regexPattern.Replace( " ", ".{0,5}" );

(?<!\p{L})key.{0,5}word(?!\p{L})

(?<=\P{L}|^)key.{0,5}word(?=\P{L}|$)

demo 1/ 2

, ,

regexPattern = regexPattern.Replace( " ", "(?=\\P{L}).{0,5}(?<=\\P{L})" );

Regex

(?<!\p{L})key(?=\P{L}).{0,5}(?<=\P{L})word(?!\p{L})

(?<=\P{L}|^)key(?=\P{L}).{0,5}(?<=\P{L})word(?=\P{L}|$)

. demo, , , .

+2

Source: https://habr.com/ru/post/1598182/


All Articles