Is there a lazy `String.Split` in C #

All string.Split methods return an array of strings ( string[] ).

I am wondering if there is a lazy option that returns an IEnumerable<string> such that one for large strings (or an infinite length of IEnumerable<char> ), when you are only interested in the first subsequences, one saves the computational effort, like memory. It can also be useful if the line is built by the device / program (network, terminal, pipes), and therefore all lines do not have to be fully accessible. Thus, you can already process the first occurrences.

Is there such a way in the .NET framework?

+6
source share
6 answers

There is no such built-in thing. Regex.Matches lazy if I interpret the decompiled code correctly. Perhaps you can use this.

Or you just write your own split function.

Actually, you can display most string functions generalized to arbitrary sequences. Often even sequences of T , not just char . BCL does not emphasize that when generalizing everything. For example, there is no Enumerable.Subsequence .

+4
source

You can easily write one:

 public static class StringExtensions { public static IEnumerable<string> Split(this string toSplit, params char[] splits) { if (string.IsNullOrEmpty(toSplit)) yield break; StringBuilder sb = new StringBuilder(); foreach (var c in toSplit) { if (splits.Contains(c)) { yield return sb.ToString(); sb.Clear(); } else { sb.Append(c); } } if (sb.Length > 0) yield return sb.ToString(); } } 

Clearly, I have not tested it for parity using string.split, but I believe that it should work in much the same way.

As Servy notes, this is not linebreaking. It is not so simple, but not so effective, but it is basically the same template.

 public static IEnumerable<string> Split(this string toSplit, string[] separators) { if (string.IsNullOrEmpty(toSplit)) yield break; StringBuilder sb = new StringBuilder(); foreach (var c in toSplit) { var s = sb.ToString(); var sep = separators.FirstOrDefault(i => s.Contains(i)); if (sep != null) { yield return s.Replace(sep, string.Empty); sb.Clear(); } else { sb.Append(c); } } if (sb.Length > 0) yield return sb.ToString(); } 
+4
source

Nothing is inbuilt, but feel free to break my Tokenize method:

  /// <summary> /// Splits a string into tokens. /// </summary> /// <param name="s">The string to split.</param> /// <param name="isSeparator"> /// A function testing if a code point at a position /// in the input string is a separator. /// </param> /// <returns>A sequence of tokens.</returns> IEnumerable<string> Tokenize(string s, Func<string, int, bool> isSeparator = null) { if (isSeparator == null) isSeparator = (str, i) => !char.IsLetterOrDigit(str, i); int startPos = -1; for (int i = 0; i < s.Length; i += char.IsSurrogatePair(s, i) ? 2 : 1) { if (!isSeparator(s, i)) { if (startPos == -1) startPos = i; } else if (startPos != -1) { yield return s.Substring(startPos, i - startPos); startPos = -1; } } if (startPos != -1) { yield return s.Substring(startPos); } } 
+2
source

There is no built-in method for this, as far as I know. But this does not mean that you cannot write it. Here is an example that will give you an idea:

 public static IEnumerable<string> SplitLazy(this string str, params char[] separators) { List<char> temp = new List<char>(); foreach (var c in str) { if (separators.Contains(c) && temp.Any()) { yield return new string(temp.ToArray()); temp.Clear(); } else { temp.Add(c); } } if(temp.Any()) { yield return new string(temp.ToArray()); } } 

Of course, this does not handle all cases and can be improved.

+1
source

I wrote this option, which also supports SplitOptions and count. It behaves the same as string.Split in all the test cases I tried. The operator name is C # 6 sepcific and can be replaced with "count".

 public static class StringExtensions { /// <summary> /// Splits a string into substrings that are based on the characters in an array. /// </summary> /// <param name="value">The string to split.</param> /// <param name="options"><see cref="StringSplitOptions.RemoveEmptyEntries"/> to omit empty array elements from the array returned; or <see cref="StringSplitOptions.None"/> to include empty array elements in the array returned.</param> /// <param name="count">The maximum number of substrings to return.</param> /// <param name="separator">A character array that delimits the substrings in this string, an empty array that contains no delimiters, or null. </param> /// <returns></returns> /// <remarks> /// Delimiter characters are not included in the elements of the returned array. /// If this instance does not contain any of the characters in separator the returned sequence consists of a single element that contains this instance. /// If the separator parameter is null or contains no characters, white-space characters are assumed to be the delimiters. White-space characters are defined by the Unicode standard and return true if they are passed to the <see cref="Char.IsWhiteSpace"/> method. /// </remarks> public static IEnumerable<string> SplitLazy(this string value, int count = int.MaxValue, StringSplitOptions options = StringSplitOptions.None, params char[] separator) { if (count <= 0) { if (count < 0) throw new ArgumentOutOfRangeException(nameof(count), "Count cannot be less than zero."); yield break; } Func<char, bool> predicate = char.IsWhiteSpace; if (separator != null && separator.Length != 0) predicate = (c) => separator.Contains(c); if (string.IsNullOrEmpty(value) || count == 1 || !value.Any(predicate)) { yield return value; yield break; } bool removeEmptyEntries = (options & StringSplitOptions.RemoveEmptyEntries) != 0; int ct = 0; var sb = new StringBuilder(); for (int i = 0; i < value.Length; ++i) { char c = value[i]; if (!predicate(c)) { sb.Append(c); } else { if (sb.Length != 0) { yield return sb.ToString(); sb.Clear(); } else { if (removeEmptyEntries) continue; yield return string.Empty; } if (++ct >= count - 1) { if (removeEmptyEntries) while (++i < value.Length && predicate(value[i])); else ++i; if (i < value.Length - 1) { sb.Append(value, i, value.Length - i); yield return sb.ToString(); } yield break; } } } if (sb.Length > 0) yield return sb.ToString(); else if (!removeEmptyEntries && predicate(value[value.Length - 1])) yield return string.Empty; } public static IEnumerable<string> SplitLazy(this string value, params char[] separator) { return value.SplitLazy(int.MaxValue, StringSplitOptions.None, separator); } public static IEnumerable<string> SplitLazy(this string value, StringSplitOptions options, params char[] separator) { return value.SplitLazy(int.MaxValue, options, separator); } public static IEnumerable<string> SplitLazy(this string value, int count, params char[] separator) { return value.SplitLazy(count, StringSplitOptions.None, separator); } } 
+1
source

I need the functionality of Regex.Split , but in a lazily priced form. The code below just goes through Matches in the input line and gives the same results as Regex.Split :

 public static IEnumerable<string> Split(string input, string pattern, RegexOptions options = RegexOptions.None) { // Always compile - we expect many executions var regex = new Regex(pattern, options | RegexOptions.Compiled); int currentSplitStart = 0; var match = regex.Match(input); while (match.Success) { yield return input.Substring(currentSplitStart, match.Index - currentSplitStart); currentSplitStart = match.Index + match.Length; match = match.NextMatch(); } yield return input.Substring(currentSplitStart); } 

Please note that using this parameter with the @"\s" parameter will give you the same results as string.Split() .

0
source

Source: https://habr.com/ru/post/981625/


All Articles