Trim whitespace from a string

I have a line with an unknown combination of whitespace characters ( \t , \n or space) between words. For instance:

 string str = "Hello \t\t \n \t \t World! \tPlease Help."; 

I want to replace each sequence of internal whitespace with one space:

 string str = "Hello World! Please Help."; 

Does .NET provide an inline way to do this? If not, how to do it with C #?

+4
source share
7 answers
 using System.Text.RegularExpressions; newString = Regex.Replace(oldString, @"\s+", " "); 
+5
source

Try the following regular expression replacement

 string original = ...; string replaced = Regex.Replace(original, @"\s+", " "); 

This will replace each group of space characters ( \s ) with one space. Here you can find other useful character groups.

+4
source

string trimmed = Regex.Replace (original, @ "\ s +", "");

Link - http://www.dotnetperls.com/regex-replace-spaces

+1
source

There is no built-in method for this, but you can use regular expressions:

 string result = Regex.Replace(str, @"\s+", " "); 
+1
source

I use a slightly different approach. A little more verbose (and currently in VB), but it allows me to easily make all kinds of exceptions, such as characters or punctuation marks or combinations of categories. It also stops me from learning regular expressions.

 Imports System.Runtime.CompilerServices Imports System.Globalization Imports System.Text Public Module StringExclusions <Extension()> Public Function CharsToString(ByVal val As IEnumerable(Of Char)) As String Dim bldr As New StringBuilder() bldr.Append(val.ToArray) Return bldr.ToString() End Function <Extension()> Public Function RemoveCategories(ByVal val As String, ByVal categories As IEnumerable(Of UnicodeCategory)) As String Return (From chr As Char In val.ToCharArray Where Not categories.Contains(Char.GetUnicodeCategory(chr))).CharsToString End Function Public Function WhiteSpaceCategories() As IEnumerable(Of UnicodeCategory) Return New List(Of UnicodeCategory) From {UnicodeCategory.SpaceSeparator, UnicodeCategory.LineSeparator, UnicodeCategory.Control} End Function '...Other commonly used categories removed for brevity. End Module 

And a few tests.

  [TestMethod] public void RemoveCharacters() { String testObj = "a \ab \bc \fd \ne \rf \tg \vh"; Assert.AreEqual(@"abcdefgh", testObj.RemoveCategories(Strings.WhiteSpaceCategories())); } [TestMethod] public void KeepValidCharacters() { String testObj = @"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ` 12334567890-=~!@ #$%^&*()_+[]\{}|;':,./<>?" + "\""; Assert.AreEqual(@"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ` 12334567890-=~!@ #$%^&*()_+[]\{}|;':,./<>?" + "\"", testObj.RemoveCategories(Strings.WhiteSpaceCategories())); } 
+1
source

You can try a faster alternative without using Regex:

 string replaced = String.Join(" ", str.Split( new char[] { ' ', '\t', '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries)); 
0
source

The fastest and most common way to do this (line terminators, tabs will also be processed). Regex powerful tools are not really needed to solve this problem, but Regex can reduce performance.

 String .Join (" ", new string (stringToRemoveWhiteSpaces .Select ( c => char.IsWhiteSpace(c) ? ' ' : c ) .ToArray<char>() ) .Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries) ) 
-one
source

Source: https://habr.com/ru/post/1399488/


All Articles