Quickly remove unnecessary spaces from a (very large) line

I work with very large (45,000,000+ characters) strings in VBA and I need to remove the extra whitespace .

One space (aka, ASCII Code 32) is fine, but any partitions with two or more consecutive spaces should be reduced to one.

I found a similar question here , although this very long line OP definition was only 39,000 characters. The accepted answer was a loop using Replace:

Function MyTrim(s As String) As String
    Do While InStr(s, "  ") > 0
        s = Replace$(s, "  ", " ")
    Loop
    MyTrim = Trim$(s)
End Function

I tried this method and it was "processed", but was very slow:

Len In:  44930886 
Len Out: 35322469
Runtime: 247.6 seconds

Is there a faster way to remove spaces from a "very large" string?

+4
2

, . , , , .

Regex .

Option Explicit

Sub Test(ByVal text As String)

  Static Regex As Object
  If Regex Is Nothing Then
    Set Regex = CreateObject("VBScript.RegExp")
    Regex.Global = True
    Regex.MultiLine = True
  End If

  Regex.Pattern = " +" ' space, one or more times

  Dim result As String: result = Regex.Replace(text, " ")
  Debug.Print Len(result), Left(result, 20)
End Sub

​​45 .

Runner:

Sub Main()

  Const ForReading As Integer = 1
  Const FormatUTF16 As Integer = -1 ' aka TriStateTrue
  Dim fso As Object: Set fso = CreateObject("Scripting.FileSystemObject")
  Dim file As Object: Set file = fso.OpenTextFile("C:\ProgramData\test.txt", ForReading, False, FormatUTF16)
  Dim text As String: text = file.ReadAll()
  Set file = Nothing
  Set fso = Nothing
  Debug.Print Len(text), Left(text, 20)

  Test (text)

End Sub

(#):

var substring = "××\n× ××   ";
var text = String.Join("", Enumerable.Repeat(substring, 45_000_000 / substring.Length));
var encoding = new UnicodeEncoding(false, false);
File.WriteAllText(@"C:\ProgramData\test.txt", text, encoding);

BTW. VBA (VB4, Java, JavaScript, #, VB,...) UTF-16, - UTF-16 ChrW(32). ( ASCII - , ANSI [Chr(32)], , , .)

+5

VBA a String 2 . "Replace - Loop" 247 45 , 4 .

, 2 3 - - .

Excel Trim, VBA Trim.

Trim , .

, Trim, , Application.WorksheetFunction, 32,767, [ ] VBA , .

, , " " :

: ( )! RegEx answer .

Function bigTrim(strIn As String) As String

    Const maxLen = 32766
    Dim loops As Long, x As Long
    loops = Int(Len(strIn) / maxLen)
    If (Len(strIn) / maxLen) <> loops Then loops = loops + 1

    For x = 1 To loops
        bigTrim = bigTrim & _
            Application.WorksheetFunction.Trim(Mid(strIn, _
            ((x - 1) * maxLen) + 1, maxLen))
    Next x

End Function

, "Replace - Loop" , :

Len In:  44930886 
Len Out: 35321845
Runtime: 33.6 seconds

7 , "Replace - Loop" , 624 , - .

( , , , , , , !)

+1

Source: https://habr.com/ru/post/1693455/


All Articles