Import and process data from a CSV file in Delphi

I had a task before the interview, which I completed, and the solution works, however, I was noted and did not receive an interview due to the use of TADODataset. I basically imported a CSV file that filled the data set, the data should be processed in a certain way, so I used filtering and sorting the data set to make sure the data was ordered the way I wanted it, and then I did the logical processing in a while loop . Feedback received reported that it was bad, as for large files it would be very slow.

My main question here is that if using the data array in memory is slow to process large files, then it would be the best way to access the information from the csv file. Should I use String Lists or something like that?

+6
source share
2 answers

It really depends on how “large” and available resources (in this case RAM) are for this task.

"The feedback received reported that this is bad, as for large files it will be very slow."

CSV files are usually used to move data (in most cases I came across, the files are ~ 1 MB + up to ~ 10 MB, but this does not mean that others will not dump more data in CSV format) without worrying too much ( if at all) about import / export, as it is extremely simplified.

Suppose you have a 80 MB CSV file, now that the file you want to process in pieces, otherwise (depending on your processing) you can eat hundreds of MB of RAM, in which case I would do:

while dataToProcess do begin // step1 read <X> lines from file, where <X> is the max number of lines you read in one go, if there are less lines(ie you're down to 50 lines and X is 100) to process, then you read those // step2 process information // step3 generate output, database inserts, etc. end; 

In the above case, you are not loading 80 MB of data into RAM, but only a few hundred KB, and the rest that you use for processing, i.e. linked lists, dynamic insertion requests (batch insertion), etc.

"... however, I was noted and did not receive an interview due to the use of TADODataset.

I'm not surprised, they probably wanted to see if you can create an algorithm and provide simple solutions in place, but without using ready-made solutions.

They probably thought that you were using dynamic arrays and creating one (or more) sorting algorithm.

"Should I use string lists or something like that?"

The answer could be the same, again, I think they wanted to see how you "work."

+3
source

The interviewer was right.

The right, scalable and fast solution on any middle file up should use the “look”.

"External sorting" is a two-step process, the first step is to split each file into smaller managed and sorted files. The second step is to combine these files into one sorted file, which can then be processed line by line.

It is extremely efficient in any CSV file with over 200,000 lines. The amount of memory in which the process is executed can be controlled, and therefore, the lack of memory can be eliminated.

I have implemented many such sorting processes, and in Delphi I would recommend a combination of the TStringList, TList, and TQueue classes.

Luck

0
source

Source: https://habr.com/ru/post/909069/


All Articles