Java regex - single space expression

I want to combine all expressions with one space. I am currently using [^\\s]*\\s[^\\s]*. However, this is not a good way.

+3
source share
6 answers

Why not? This is normal, a bit more complicated:

\\S*\\s\\S*
+5
source

I want to combine all expressions with one space.

The correct template to find out if a space exists in a Java string:

\A[^\u0009\u000A-\u000D\u0020\u0085\u00A0\u1680\u180E\u2000-\u200A\u2028\u2029\u202F\u205F\u3000]*+[\u0009\u000A-\u000D\u0020\u0085\u00A0\u1680\u180E\u2000-\u200A\u2028\u2029\u202F\u205F\u3000][\u0009\u000A-\u000D\u0020\u0085\u00A0\u1680\u180E\u2000-\u200A\u2028\u2029\u202F\u205F\u3000]*+\z

Other answers presented here do not correctly answer the question.

All Unicode space characters are listed here, along with their age (which means which Unicode release they first appeared) and their binary properties associated with spacing issues.

U+0009 CHARACTER TABULATION
    \s \h \pC \p{Cc}
    Age=1.1 HorizSpace Pattern_White_Space Space White_Space
U+000A LINE FEED (LF)
    \s \v \R \pC \p{Cc}
    Age=1.1 Pattern_White_Space Space VertSpace White_Space
U+000B LINE TABULATION 
    \v \R \pC \p{Cc}
    Pattern_White_Space Space VertSpace White_Space 
U+000C FORM FEED (FF)
    \s \v \R \pC \p{Cc}
    Age=1.1 Pattern_White_Space Space VertSpace White_Space
U+000D CARRIAGE RETURN (CR)
    \s \v \R \pC \p{Cc}
    Age=1.1 Pattern_White_Space Space VertSpace White_Space
U+0020 SPACE
    \s \h \pZ \p{Zs}
    Age=1.1 HorizSpace Pattern_White_Space Space Space_Separator White_Space
U+0085 NEXT LINE (NEL)
    \s \v \R \pC \p{Cc}
    Age=1.1 Pattern_White_Space Space VertSpace White_Space
U+00A0 NO-BREAK SPACE
    \s \h \pZ \p{Zs}
    Age=1.1 HorizSpace Space Space_Separator White_Space
U+1680 OGHAM SPACE MARK
    \s \h \pZ \p{Zs}
    Age=3.0 HorizSpace Space Space_Separator White_Space
U+180E MONGOLIAN VOWEL SEPARATOR
    \s \h \pZ \p{Zs}
    Age=3.0 HorizSpace Space Space_Separator White_Space
U+2000 EN QUAD
    \s \h \pZ \p{Zs}
    Age=1.1 HorizSpace Space Space_Separator White_Space
U+2001 EM QUAD
    \s \h \pZ \p{Zs}
    Age=1.1 HorizSpace Space Space_Separator White_Space
U+2002 EN SPACE
    \s \h \pZ \p{Zs}
    Age=1.1 HorizSpace Space Space_Separator White_Space
U+2003 EM SPACE
    \s \h \pZ \p{Zs}
    Age=1.1 HorizSpace Space Space_Separator White_Space
U+2004 THREE-PER-EM SPACE
    \s \h \pZ \p{Zs}
    Age=1.1 HorizSpace Space Space_Separator White_Space
U+2005 FOUR-PER-EM SPACE
    \s \h \pZ \p{Zs}
    Age=1.1 HorizSpace Space Space_Separator White_Space
U+2006 SIX-PER-EM SPACE
    \s \h \pZ \p{Zs}
    Age=1.1 HorizSpace Space Space_Separator White_Space
U+2007 FIGURE SPACE
    \s \h \pZ \p{Zs}
    Age=1.1 HorizSpace Space Space_Separator White_Space
U+2008 PUNCTUATION SPACE
    \s \h \pZ \p{Zs}
    Age=1.1 HorizSpace Space Space_Separator White_Space
U+2009 THIN SPACE
    \s \h \pZ \p{Zs}
    Age=1.1 HorizSpace Space Space_Separator White_Space
U+200A HAIR SPACE
    \s \h \pZ \p{Zs}
    Age=1.1 HorizSpace Space Space_Separator White_Space
U+2028 LINE SEPARATOR
    \s \v \R \pZ \p{Zl}
    Age=1.1 Pattern_White_Space Space VertSpace White_Space
U+2029 PARAGRAPH SEPARATOR
    \s \v \R \pZ \p{Zp}
    Age=1.1 Pattern_White_Space Space VertSpace White_Space
U+202F NARROW NO-BREAK SPACE
    \s \h \pZ \p{Zs}
    Age=3.0 HorizSpace Space Space_Separator White_Space
U+205F MEDIUM MATHEMATICAL SPACE
    \s \h \pZ \p{Zs}
    Age=3.2 HorizSpace Space Space_Separator White_Space
U+3000 IDEOGRAPHIC SPACE
    \s \h \pZ \p{Zs}
    Age=1.1 HorizSpace Space Space_Separator White_Space

, , , Unicode 1.1. U + 1680 OGHAM SPACE MARK, U + 180E MONGOLIAN VOWEL SEPARATOR U + 202F NARROW NO-BREAK SPACE Unicode 3.0, U + 205F MEDIUM MATHEMATICAL SPACE 3.2. .

\p{Whitespace} UTS # 18 RL1.2 "" \p{space} \s UTS # 18 RL1.2a " " .

Unicode Standard 6.0.0s, White_Space , , , . , , Unicode Unicode.

j.u.r.Pattern , Unicode . , Javas regexes , , UTS # 18: Unicode. - 1, :

1 - Unicode. , Unicode, 1.

Javas , Unicode, Javas Unicode.. , . , .

+1

, ( ):

String s = "one whitespace";


public boolean hasOneWhitespace(String s) {
   int count = 0;
   for (int i = 0; i < s.length(); i++) {
      if(s.charAt(i) == ' ') {
         count++;
         if (count > 1) return false;
      }
   }
   return count == 1;   
}

Of course, this will only work if you consider it " "white. Tabs and translation symbols will not work.

0
source

You can also check it with indexOf:

String s = "some text";
int indexOf = s.indexOf(' ');
boolean isOneWhitespace = (indexOf >= 0 && indexOf == s.lastIndexOf(' '));
0
source

Use transliteration. It should be an independent test, the regular expression you have above cannot be combined with a large regular expression and still check for single spaces.

Transliteration is 10-20 times faster than the regular expression for this test.
This is a jtr example:

String aInput = "This is a test, 123.";
CharacterReplacer cReplacer = Perl5Parser.makeReplacer( "tr[ \\t\\r\\n\\f\\x0B][ \\t\\r\\n\\f\\x0B]" );
String aResult = cReplacer.doReplacement( aInput );
int nMatches = cReplacer.getMatches();

if (nMatches == 1) { ... }
0
source
String[] ss = { " ", "abc", "a bc", "a b c d" };
Matcher m = Pattern.compile("^\\S*\\s\\S*$").matcher("");
for (String s : ss)
{
  if (m.reset(s).matches())
  {
    System.out.printf("%n>>%s<< OK%n", s);
  }
}

output:

>> << OK

>>a bc<< OK
0
source

Source: https://habr.com/ru/post/1788670/


All Articles