I have a network client that processes data from a server.
The data is sent as a series of messages, which in themselves are collections of keys / values ββsimilar in definition to HTTP headers (except for the absence of a "message body"), here is a typical one-way message (lines separated by the \r\n character):
Response: OK Channel: 123 Status: OK Message: Spectrum is green Author: Gerry Anderson Foo123: Blargh
My protocol client works by reading from NetworkStream , character by character, using StreamReader and while( (nc = rdr.Read()) != -1 ) , and uses a state parser and StringBuilder instance to populate Dictionary<String,String> instances. These Dictionary instances are then stored in internal memory structures for further processing; they usually have a useful lifespan of about 10 minutes each.
My client receives thousands of these messages per hour, and the long client process is a problem, because my client process often grows to consume more than 2 GB of memory from these String instances - I used windbg to see where all the memory went. This is a problem because the code runs on Azure VM with a memory capacity of 3.5 GB. I see no reason why my program should consume more than a few hundred MB of RAM. Often, I will sit in standby mode and monitor the memory consumption of my process over time, and it will gradually grow to about 2 GB, and then suddenly drop to about 100 MB when the GC completes its assembly and then grows again. Times can vary between GC cycles, without predictability.
Since many of these strings are identical (for example, Response , Status keys, etc.), as well as well-known values ββsuch as OK and Fail , I can use string interning to reduce usage, like so:
The problem is that I see a place for additional optimization: sb.ToString() is going to allocate a new instance of the string that will be used for interning, and secondly: interned strings for appdomain life, and, unfortunately, some of the keys do not will see reuse and will actually be wasting memory, like Foo123 in my example protocol.
One of the solutions I was thinking about is to not use string internationalization and instead have a class containing static readonly string fields that are known keys, and then use regular, non-interned strings, which will eventually be GC 'd, and therefore do not risk filling the internal row pool with one-time lines. Then I compared the StringBuilder instance with these famous strings, and if so, use them instead of calling sb.ToString() , thereby skipping another string distribution.
However, if I make a choice to put each line, the internal pool will continue to grow, and unfortunately .NET does not have a .Chlorinate() method for the line pool, is there a way to delete a single-line file, use the lines from the internal pool, if I continue with the String.Intern approach, or am I better using my own static read-only string instances?