How to serialize a large collection

Question

How to serialize a large collection

I work with a system that has lists and dictionaries with over five million elements, where each element is usually flat dto with up to 90 primitive properties. Collections are stored on disk using protobuf-net to provide stability and subsequence processing.

No wonder we hit LOH during processing and serialization.

We can avoid LOH during processing with ConcurrentBag, etc., but we still encounter a problem with serialization.

Currently, items in the collection are grouped into groups of 1000 and are sequentially serialized into memory streams. Each byte array is placed in a parallel queue for subsequent writing to the file stream.

Although I understand that this is trying to do, it seems too complicated. It seems that there should be something in the proto-buffer itself that is associated with huge collections without using LOH.

I hope I made a student mistake - that there are some settings that I forgot. Otherwise, I will search to write my own binary reader / writer.

I must point out that we are using 4.0, hoping to move on to 4.5 soon, but be aware that we will not overcome this problem, despite the improvements to the GC.

Any help was appreciated.

+6

c # serialization large-object-heap protobuf-net

Joe Sep 13 '13 at 14:04

source share

1 answer

sino · Answer 1 · 2013-09-30T20:26:31+0000

Write data to disk and do not use memory stream.

read using StreamReader, so you won’t need to store a large amount of data from memory if you need to load all the data at the same time for processing, do it on the SQL server, telling them in the temprory table.

Memory

is not a place to store big data.

How to serialize a large collection

More articles: