Fast read / write complex .Net graphic object

I have my own data structure written in C # (the structure is quite complex). I need to serialize and deserialize the structure. The size of a serialized file on disk can be quite large at times (about 1 GB), but it can also be small (depending on the number of records saved). I have the following requirements:

  • Serialization and deserialization must be very fast.
  • I need to partially deserialize a large file (i.e. access only some relevant records), because if I deserialize the entire file from disk, the memory usage will be too high.
  • Must be thread safe, as multiple processes can write / read writes from a file

I know, it seems to me that I need a database, but I can’t use it for several reasons. I tried to fulfill requirement 1 by doing ISerializable, which made it much faster than using .net built into Binary / XML serializers, but not fast enough. On demand 2 I am completely clogged.

Anyone have any ideas on how to do this? I think that someone who had to save their own large file formats was dealing with similar problems.

Regards Sam

+4
source share
4 answers

Is is a data tree or full graph - i.e. are there any circular references? If not, protobuf-net is a high-performance binary tree serializer. It supports streams of enumerated elements (so you can skip entries, etc., rather than buffering everything), but to efficiently search for a random element, I expect you to need some kind of index.

Reading / writing is VERY difficult for a single file; in particular, for writing, you may need to move lots to disk more than you expect ... reading is also difficult, and synchronization may be required. It would be easier to use separate files ...


An example of skipping early items; I could add a helper method, but the TryDeserializeWithLengthPrefix method will work ... the key point is to observe serialization and deserialization, we create only one additional object.

 using System; using System.IO; using System.Threading; using ProtoBuf; [ProtoContract] class Foo { static int count; public static int ObjectCount { get { return count; } } public Foo() { // track how many objects have been created... Interlocked.Increment(ref count); } [ProtoMember(1)] public int Id { get; set; } [ProtoMember(2)] public double Bar { get; set; } } static class Program { static void Main() { MemoryStream ms = new MemoryStream(); Random rand = new Random(); for (int i = 1; i <= 5000; i++) { Foo foo = new Foo { Bar = rand.NextDouble(), Id = i }; Serializer.SerializeWithLengthPrefix(ms, foo,PrefixStyle.Base128, 1); } ms.Position = 0; // skip 1000 int index = 0; object obj; Console.WriteLine(Foo.ObjectCount); Serializer.NonGeneric.TryDeserializeWithLengthPrefix( ms, PrefixStyle.Base128, tag => ++index == 1000 ? typeof(Foo) : null, out obj); Console.WriteLine(Foo.ObjectCount); Console.WriteLine(((Foo)obj).Id); } } 
+2
source

I did not work in any scenario, as you have here. However, in the past I discussed a similar problem, and here is the outcome of the discussion. (Although, I admit, I have never seen an implementation). In addition, I am afraid that there cannot be a simple direct solution.

Assumptions:

I am. The data to be recorded is sorted.

Decision:

I am. Fragment the data warehouse into multiple files. Highlight a range of sorted values ​​for each file. eg. write 1-10000 to file 1, write 100001-20000 to file 2, etc.

II. When you write / read data, you know the range in front of your hand so you can meet point 2.

III. He will also decide clause 3 if the likelihood that two or more processes request the same data is less.

To provide a more accurate solution, we need more information about what you are trying to achieve.

+2
source

I think we will need more information about how the file actually looks ...

Can't you just read the sizeof (yourstruct) fragments from the file and process them separately by reading all the records in memory?

0
source

For partial (or split) deserialization (which I was looking for in myself, for example, dynamic and static parts at the game level), I think you will have to write your own serialization mechanism.

0
source

Source: https://habr.com/ru/post/1285743/


All Articles