I have a large file about 400 GB in size. It is produced daily by an external closed system. This is a binary file with the following format:
byte[8]byte[4]byte[n]
Where n is equal to the value of int32 bytes [4].
There are no delimiters in this file and for reading the whole file that you just repeat before EOF. With each "element" represented as byte [8] byte [4] byte [n].
File looks like
byte[8]byte[4]byte[n]byte[8]byte[4]byte[n]...EOF
byte [8] is a 64-bit number representing the time period represented by .NET Ticks. I need to sort this file, but it may not seem like this is the fastest way to do this.
I am currently loading Ticks into the structure and byte [n] of the start and end positions and read at the end of the file. After that, I sort the list in memory using the Ticks property, and then open the BinaryReader and look for each position in the Ticks order, read the value of byte [n] and write to an external file.
At the end of the process, I get a sorted binary, but it accepts FOREVER. I am using C # .NET and a pretty promising server, but the problem with the IO disk is the problem.
Server Features:
- 2x 2.6 GHz Intel Xeon (Hex-Core with HT) (24 threads)
- RAM 32 GB
- 500 GB RAID 1 + 0
- 2TB RAID 5
I looked all over the Internet and can find examples when a huge file is 1 GB (makes me smile).
Does anyone have any tips?
source share