I would like to be able to use a regular expression to find matches for a specific keyword phrase within some text.
The key phrase may or may not contain 1 or more spaces (usually it will be only one word, but in some cases there may be several words).
Currently, I am using the following expression, where the key phrase is a single word (without spaces):
var regexPattern = string.Format( "\\b({0})\\b", keyphrase );
When a keyword phrase is multiple words (contains one or more spaces), I then update the expression to replace any of these spaces with a wildcard:
regexPattern = regexPattern.Replace( " ", ".*" );
There are several scenarios in which this does not behave as I need.
1) If the key phrase in my long text (which I am looking for matches) is surrounded by either an underscore or a number, it no longer matches. This is great with hyphens, commas, full stops, etc. In these scenarios, it still detects a passphrase, but I also need it to match when the passphrase is surrounded by underscores or numbers.
2) In a scenario where my keyword phrase consists of several words (contains 1 or more spaces), I would like to allow up to a certain maximum distance / length between each of the words that form my keyword phrase.
eg. If my key phrase is:
for sale
... and the text I'm matching
I have a bike for sale.
... (where there is a maximum distance of 5 characters between key phrases), I would like the regular expression to match:
bike for sale
, , 5 , , .
, "" , , :
I have a bike for _.,1sale.
, , , , , , , , :
.
I have a bike for _.,1sale. I've also got a laptop for sale!
, , , 2 , , , , .