Algorithms requiring 4 GB or 5 GB numbers - is this possible?

Well, this problem is really a challenge!

Background

I am working on an arithmetic project involving a larger than normal number. I’m new, I was going to work with the worst case-senario of 4 GB in size (I even hoped to expand it to 5 GB, since earlier I saw file sizes over 4 GB, in particular, the image *. Iso)

General question

Now the algorithm (s) to which I will apply the calculation does not matter at the moment, but loading and processing such large amounts of data - numbers - do.

  • A System.IO.File.ReadAllBytes(String) can only read 2 GB files, so this is my first problem. How will I upload and / or configure access to memory, the size of such files is twice as large, if not more?
  • Then I wrote my own class to treat a “stream” or an array of bytes as a large number and added a few operator methods to perform hexadecimal arithmetic until I read about the System.Numerics.BigInteger() class on the Internet, but since it's no BigInteger.MaxValue and that I can load a maximum of 2 GB of data at a time, I don’t know what the potential of BigInteger is - even compared to the object I wrote, it is called Number() (which has my desired minimum potential). There were also problems with available memory and performance, although I don't care about speed, but successfully completing this experimental process.

Summary

  • How to download 4-5 gigabytes of data?
  • How to store and process data after downloading it? Paste with BigInteger or end your own Number class?
  • How should I handle such large amounts of memory at runtime without running out of memory? I will process data of 4-5 GB in size, like any other number, instead of an array of bytes - doing arithmetic such as division and multiplication.

PS I can’t disclose too much information about this project as part of a non-disclosure agreement .;)

For those who would like to see an example statement from my Number object for a byte array (C #):

 public static Number operator +(Number n1, Number n2) { // GB5_ARRAY is a cap constant for 5 GB - 5368709120L byte[] data = new byte[GB5_ARRAY]; byte rem = 0x00, bA, bB, rm, dt; // Iterate through all bytes until the second to last // The last byte is the remainder if any // I tested this algorithm on smaller arrays provided by the `BitConverter` class, // then I made a few tweeks to satisfy the larger arrays and the Number object for (long iDx = 0; iDx <= GB5_ARRAY-1; iDx++) { // bData is a byte[] with GB5_ARRAY number of bytes // Perform a check - solves for unequal (or jagged) arrays if (iDx < GB5_ARRAY - 1) { bA = n1.bData[iDx]; bB = n2.bData[iDx]; } else { bA = 0x00; bB = 0x00; } Add(bA, bB, rem, out dt, out rm); // set data and prepare for the next interval rem = rm; data[iDx] = dt; } return new Number(data); } private static void Add(byte a, byte b, byte r, out byte result, out byte remainder) { int i = a + b + r; result = (byte)(i % 256); // find the byte amount through modulus arithmetic remainder = (byte)((i - result) / 256); // find remainder } 
+4
source share
3 answers

Usually you process large files using the streaming API, either raw binary ( Stream ), or through some protocol reader ( XmlReader , StreamReader , etc.). This can also be done with memory mapped files in some cases. The key point here is that you are viewing only a small part of the file at a time (moderate-sized data buffer, logical “row” or “node”, etc. - depending on the scenario).

If this is strange, your desire to compare it somehow directly with some large mass. Honestly, I don’t know how we can help with this without additional information, but if you are dealing with an actual number of this size, I think you will fight if only the binary protocol does not make it convenient. And "performing arithmetic such as division and multiplication" is meaningless according to raw data; this only makes sense when analyzing data with specific user operations.

Also: note that in .NET 4.5 you can flip the configuration switch to increase the maximum size of arrays by going over the 2 GB limit. He still has a limit, but: he is a little more. Unfortunately, the maximum number of elements remains unchanged, so if you use the byte[] array, this will not help. But if you use SomeCompositeStruct[] , you can get a higher level of use. See gcAllowVeryLargeObjects

+6
source

FileStream is the beginning for you.

If you do not have enough memory (it should be at least 4 times larger than the maximum size of your size), you will need to use a hard drive. Therefore, instead of having all the data in memory, you prefer to download part of the data, perform some calculations and write them back to the hard drive.

0
source

Source: https://habr.com/ru/post/1433523/


All Articles