You do not say whether you want to have the correct substring match or substring matching word boundaries. There is a difference. Here's how to do it, following the word boundaries:
str = "this is the string " array = ["this is" ,"second element", "third element"] pattern = /\b(?:#{ Regexp.union(array).source })\b/ # => /\b(?:this\ is|second\ element|third\ element)\b/ str[pattern] # => "this is" str.gsub(pattern, '').squeeze(' ').strip # => "the string"
Here, what happens to the union and union.source :
Regexp.union(array) # => /this\ is|second\ element|third\ element/ Regexp.union(array).source # => "this\\ is|second\\ element|third\\ element"
source returns a merged array in a form that can be more easily used by Regex when creating a template without inserting holes in the template. Consider these differences and what they can do according to the pattern:
/
The first creates a separate template, with its own flags for the case, multi-line and white space, which will be built into the external template. This may be a mistake that is very difficult to track and correct, so only do this when you intend to have a subtask.
Also note what happens if you try to use:
/
The resulting template has a wildcard . built into it, which will blur your template, as a result of which it will correspond to something. Do not go there.
If we donβt tell the regular expression engine to keep the word boundaries, then unexpected / unwanted / terrible things can happen:
str = "this isn't the string " array = ["this is" ,"second element", "third element"] pattern = /(?:#{ Regexp.union(array).source })/ # => /(?:this\ is|second\ element|third\ element)/ str[pattern] # => "this is" str.gsub(pattern, '').squeeze(' ').strip # => "n't the string"
It is important to think in terms of words when working with substrings containing complete words. The engine does not know the difference, so you need to say what to do. This is a situation that is too often missed by people who did not have to do text processing.