What is needed to implement a custom string type in f # for interning strings. I have to read large csv files in memory. Given that most columns are categorical, the values are repeated, and it makes sense to create a new row the first time it occurs, and refer to it only on subsequent entries to save memory.
In C #, I do this by creating a global internal pool (concurrent dict) and before setting the value, find the dictionary if it already exists. if it exists, just point to the line already in the dictionary. if not, add it to the dictionary and set the value for the line just added to the dictionary.
New to f # and interestingly, this is the best way to do this in f #. will use the new row type in entries named tuples, etc., and it will have to work with parallel processes.
Edit:
String.Internuses the input pool. I understand that it is not very effective for large pools and is not garbage collection, that is, all / all interned strings will remain in the experience for the pool for the entire life of the application. Imagine an application in which you read a file, perform some operations, and write data. Using the Intern Pool solution is likely to work. Now imagine that you have to do the same thing 100 times, and the lines in each file have little in common. If the memory is allocated on the heap, after processing each file, we can force the garbage collector to clear unnecessary lines.
I should have mentioned that I could not figure out how to make the C # approach in F # (other than implementing a C # type and using it in F #)
Is the outline of the memories a little different from what I'm looking for? We do not cache the calculated results - we guarantee that each string object is created no more than once, and all subsequent creations of the same string are only links to the original. Using a dictionary for this is one way and using String.Intern is another.
Sorry if you are missing something obvious here.