English version
For the English version, you can make a fairly simple regular expression. I might have missed some custom separators, but:
public static int getWordCount(String str) { return str.split("[\\s,;-]+").length; }
Regex explanation:
Divide if you find any of the group [] :
[ \\s Any whitespace character or , A comma ; or a semi-colon ] + Followed by any patterns in the group any number of times
Chinese version
For the Chinese version, you need to determine what separators are. If you get the Unicode char code of Chinese delimiters and add them to the above regex, you will get the desired results.
Test
System.out.println(getWordCount("This is a sentence"));// 4 System.out.println(getWordCount("This is a sentence")); // 4 System.out.println(getWordCount("This is a ,,sentence")); // 4
source share