TStringList.LoadFromFile - Exceptions with Large Text Files

I am running Delphi RAD Studio XE2.

I have very large files, each of which contains a large number of lines. The lines themselves are small - only three tabs are separated. I want to upload a file to TStringList using TStringList.LoadFromFile , but this throws an exception from large files.

For files of 2 million lines (approximately 1 GB), I get an EIntOverflow exception. For large files (e.g. 20 million lines and approximately 10 GB) I get an ERangeCheck exception.

I have 32 GB of RAM, and I'm just trying to download this file and use it quickly. What is going on here and what other options do I have? Can I use a file stream with a large buffer to load this file into a TStringList? If so, you can give an example.

+3
source share
1 answer

When Delphi switched to Unicode in Delphi 2009, the TStrings.LoadFromStream() method (which TStrings.LoadFromFile() calls internally) became very inefficient for large streams / files.

Internally, LoadFromStream() reads the entire file into memory as TBytes , and then converts it to UnicodeString using TEncoding.GetString() (which decodes bytes into TCharArray , copies it to the final UnicodeString , and then frees the array), then parses UnicodeString (while TBytes still in memory) adds substrings to the list as needed.

So, before LoadFromStream() four copies of file data in memory — three copies take up worse filesize * 3 bytes of memory (where each copy uses its own contiguous block of memory + some MemoryMgr overhead) and one instance for syntax substrings! Of course, the first three copies are freed when LoadFromStream() actually completes. But this explains why you get memory errors before reaching this point - LoadFromStream() tries to use 3-4 GB of memory to load 1 GB of file, and the RTL memory box cannot handle this.

If you want to load the contents of a large file into a TStringList , you'd better use TStreamReader instead of LoadFromFile() . TStreamReader uses a buffered file I / O method to read a file in small fragments. Just call its ReadLine() method in the loop, Add() 'on each line of the TStringList . For instance:

 //MyStringList.LoadFromFile(filename); Reader := TStreamReader.Create(filename, true); try MyStringList.BeginUpdate; try MyStringList.Clear; while not Reader.EndOfStream do MyStringList.Add(Reader.ReadLine); finally MyStringList.EndUpdate; end; finally Reader.Free; end; 

Maybe someday, LoadFromStream() can be rewritten to use TStreamReader internally, like this.

+14
source

Source: https://habr.com/ru/post/1207503/


All Articles