Extract each complete word containing a specific substring

I am trying to write a function that extracts every word from a sentence containing a specific substring, for example. Finding Poe, Pork Pork returns to Porky Pork Chop.

I tested my regex in regexpal, but the Java code doesn't seem to work. What am I doing wrong?

private static String foo() { String searchTerm = "Pizza"; String text = "Cheese Pizza"; String sPattern = "(?i)\b("+searchTerm+"(.+?)?)\b"; Pattern pattern = Pattern.compile ( sPattern ); Matcher matcher = pattern.matcher ( text ); if(matcher.find ()) { String result = "-"; for(int i=0;i < matcher.groupCount ();i++) { result+= matcher.group ( i ) + " "; } return result.trim (); }else { System.out.println("No Luck"); } } 
+6
source share
6 answers
  • In Java, to pass \b word boundaries to the regular expression engine, you need to write it as \\b . \b is the backspace in the String object.

  • Judging by your example, you want to return all words containing your substring. To do this, do not use for(int i=0;i < matcher.groupCount ();i++) , but while(matcher.find()) , since the number of groups will while(matcher.find()) over all groups in one match, and not in all matches.

  • If your string may contain some special characters, you should probably use Pattern.quote(searchTerm)

  • In your code, you are trying to find "Pizza" in "Cheese Pizza" , so I assume that you also want to find the strings in the same way as the search substring. Although your regular expression will work fine, you can change your last part (.+?)?) To \\w* , and also add \\w* at the beginning, if the substring should also be matched in the middle of the word (not only at startup )

So your code might look like

 private static String foo() { String searchTerm = "Pizza"; String text = "Cheese Pizza, Other Pizzas"; String sPattern = "(?i)\\b\\w*" + Pattern.quote(searchTerm) + "\\w*\\b"; StringBuilder result = new StringBuilder("-").append(searchTerm).append(": "); Pattern pattern = Pattern.compile(sPattern); Matcher matcher = pattern.matcher(text); while (matcher.find()) { result.append(matcher.group()).append(' '); } return result.toString().trim(); } 
+2
source

Although the regex approach is certainly a valid method, it’s easier for me to think through when you divide the words into spaces. This can be done using the String split method.

 public List<String> doIt(final String inputString, final String term) { final List<String> output = new ArrayList<String>(); final String[] parts = input.split("\\s+"); for(final String part : parts) { if(part.indexOf(term) > 0) { output.add(part); } } return output; } 

Of course, it costs nothing that it will effectively do two passes through your input string. The first pass is to find characters that are space separators, and the second pass looks through each separated word for your substring.

If one pass is required, the regex path is better.

+2
source

I find that Nicholas. Hauschild responds to be the best.

However, if you really wanted to use a regex, you can do it as such:

 String searchTerm = "Pizza"; String text = "Cheese Pizza"; Pattern pattern = Pattern.compile("\\b" + Pattern.quote(searchTerm) + "\\b", Pattern.CASE_INSENSITIVE); Matcher matcher = pattern.matcher(text); while (matcher.find()) { System.out.println(matcher.group()); } 

Output:

 Pizza 
+1
source

Sample must be

 String sPattern = "(?i)\\b("+searchTerm+"(?:.+?)?)\\b"; 

You want to grab a whole line (pizza). ?: ensures that you will not capture part of the string twice.

+1
source

Try this template:

 String searchTerm = "Po"; String text = "Porky Pork Chop oPod zzz llPo"; Pattern p = Pattern.compile("\\p{Alpha}+" + substring + "|\\p{Alpha}+" + substring + "\\p{Alpha}+|" + substring + "\\p{Alpha}+"); Matcher m = p.matcher(myString); while(m.find()) { System.out.println(">> " + m.group()); } 
0
source

Ok, I give you a template in raw style (not in java style, you have to double yourself):

 (?i)\b[az]*po[az]*\b 



And it's all.

0
source

Source: https://habr.com/ru/post/950433/


All Articles