What is the purpose of the StringSegment class?

Question

What is the purpose of the StringSegment class?

There is a StringSegment class in the Microsoft.Extensions.Primitives lib package for which comments indicate that it is:

Optimized substring representation.

I did not know about this particular class until I discovered an aspnet # 244 declaration that says: Microsoft.Net.Http.Headers converted to use StringSegments.

However, looking at the implementation of the StringSegment class , I don’t see what purpose it actually fulfills. I see a buffer that I suppose would show better manipulation of partial characters (perhaps part of a “segment”?). I also see several helper functions that are closely related to each other, if not identical, in behavior with those that are already available on regular lines, such as StartsWith / Endswith, Substring, etc. aspnet-core docs list them completely, but again this also lacks the context of why it should be used.

So what is the purpose of the StringSegment class and in what scenarios is it applicable to use it?

Is it useful to call a class in application code when I manage strings? Can we give an example where this will be useful?

+5

c #

Juliën May 24, '17 at 15:43

source share

2 answers

When parsing text, many new string objects can be created or copied. This class in theory will help reduce the memory used in processing large substrings. Other languages have similar concepts (see Std :: string_view in C ++ 17)

0

Matthew jimenez May 24, '17 at 15:54

source share

Ed plunkett · Accepted Answer · 2017-05-24T15:50:00+0000

It allows you to perform many string operations on a substring of another string without actually calling Substring() and creating a new string object. This is roughly the same as in C you can have a pointer in the middle of a line:

 char * s1 = "foo bar"; char * s2 = p + 4;

s2 "is the" string "bar" in a useful sense.

Take, for example, StringSegment.IndexOf() : you can get the index of a character in a segment of a string without first requiring to call Substring() on a large string and allocate a new buffer:

  public int IndexOf(char c, int start) { return IndexOf(c, start, Length - start); }

You can trim the StringSegment and remove the spaces:

  public StringSegment TrimStart() { var trimmedStart = Offset; while (trimmedStart < Offset + Length) { if (!char.IsWhiteSpace(Buffer, trimmedStart)) { break; } trimmedStart++; } return new StringSegment(Buffer, trimmedStart, Offset + Length - trimmedStart); }

These are very cheap operations, without distribution, etc.

You can do all this by playing with indices yourself, but such code is annoying and error prone. You would rather wrap the abstraction around it.

This is also a “deferred” call to String.Substring() . What (hopefully) got is that if you create some of them, most or all of them will never return the actual substring at all.

Take a look at the constructor:

  public StringSegment(string buffer, int offset, int length)

The public properties String Buffer , int Offset and int Length are all read-only.

And the Value property:

  public string Value { get { if (!HasValue) { return null; } else { return Buffer.Substring(Offset, Length); } } }

Thus, you can create these things relatively cheaply if you want to expose a collection of potentially large "substrings" in some larger line. If no one calls Value.get , Substring will never be called. If you have a lot of them, and the consumer receives only one or two of them, you avoid many calls to Substring() .

As Servy shows, if you call Value twice on the same object, you call Buffer.Substring(Offset, Length); twice, not at all. If you still avoid the 20 other challenges, this might just be a net win. You may wonder why they do not cache the return value from Buffer.Substring() . I do not know if this was unnecessary due to internment, or if this optimization was found in practice so as not to be worth the effort.

What is the purpose of the StringSegment class?

More articles: