Need C # Regex to get a couple of words in a sentence

Is there a regex that accepts the following sentence:

"I want it to be paired"

and generate the following list:

“I want”, “I want it”, “this split”, “split”, “up”, “in pairs”

+6
source share
4 answers

Since words need to be reused, you will need lookahead statements:

Regex regexObj = new Regex( @"( # Match and capture in backreference no. 1: \w+ # one or more alphanumeric characters \s+ # one or more whitespace characters. ) # End of capturing group 1. (?= # Assert that there follows... (\w+) # another word; capture that into backref 2. ) # End of lookahead.", RegexOptions.IgnorePatternWhitespace); Match matchResult = regexObj.Match(subjectString); while (matchResult.Success) { resultList.Add(matchResult.Groups[1].Value + matchResult.Groups[2].Value); matchResult = matchResult.NextMatch(); } 

For groups of three:

 Regex regexObj = new Regex( @"( # Match and capture in backreference no. 1: \w+ # one or more alphanumeric characters \s+ # one or more whitespace characters. ) # End of capturing group 1. (?= # Assert that there follows... ( # and capture... \w+ # another word, \s+ # whitespace, \w+ # word. ) # End of capturing group 2. ) # End of lookahead.", RegexOptions.IgnorePatternWhitespace); 

and etc.

+5
source

You could do

 var myWords = myString.Split(' '); var myPairs = myWords.Take(myWords.Length - 1) .Select((w, i) => w + " " + myWords[i + 1]); 
+4
source

You can simply use string.Split() and combine the results:

 var words = myString.Split(new char[] { ' ' }); var pairs = new List<string>(); for (int i = 0; i < words.Length - 1; i++) { pairs.Add(words[i] + words[i+1]); } 
+3
source

To do this only with RegEx and without further processing, we can reuse Tim Pitzker's response, but passing two consecutive RegEx

We can convey the original from Tim Pitzker’s answer and the same with lookbehind, which will force the regular expression to start recording with the second word.

If you combine the results from two RegEx, you will have all the pairs from the text.

 Regex regexObj1 = new Regex( @"( # Match and capture in backreference no. 1: \w+ # one or more alphanumeric characters \s+ # one or more whitespace characters. ) # End of capturing group 1. (?= # Assert that there follows... (\w+) # another word; capture that into backref 2. ) # End of lookahead.", RegexOptions.IgnorePatternWhitespace); Match matchResult = regexObj.Match(subjectString); while (matchResult.Success) { resultList.Add(matchResult.Groups[1].Value + matchResult.Groups[2].Value); matchResult = matchResult.NextMatch(); } Regex regexObj2 = new Regex( @"(?<= # Assert that there preceds and will not be captured \w+\s+ # the first word followed by any space ) ( # Match and capture in backreference no. 1: \w+ # one or more alphanumeric characters \s+ # one or more whitespace characters. ) # End of capturing group 1. (?= # Assert that there follows... (\w+) # another word; capture that into backref 2. ) # End of lookahead.", RegexOptions.IgnorePatternWhitespace); Match matchResult1 = regexObj1.Match(subjectString); Match matchResult2 = regexObj2.Match(subjectString); 

etc.

For groups of three:

You will need to add the third RegEx to the program:

 Regex regexObj3 = new Regex( @"(?<= # Assert that there preceds and will not be captured \w+\s+\w+\s+ # the first and second word followed by any space ) ( # Match and capture in backreference no. 1: \w+ # one or more alphanumeric characters \s+ # one or more whitespace characters. ) # End of capturing group 1. (?= # Assert that there follows... (\w+) # another word; capture that into backref 2. ) # End of lookahead.", RegexOptions.IgnorePatternWhitespace); Match matchResult1 = regexObj1.Match(subjectString); Match matchResult2 = regexObj2.Match(subjectString); Match matchResult3 = regexObj3.Match(subjectString); 
0
source

Source: https://habr.com/ru/post/892760/


All Articles