I have a folder with 3000 csv files in size from 1Kb to 100kb. Each line in these files is 43 characters long. They have a total size of 171Mb.
I am trying to write a program to parse these files as quickly as possible.
At first I tried my own implementation, but was not happy with these results. Then I found LumenWorks.Framework.IO.Csv on StackOverflow. He has bold statements:
To give more approximate figures, with a 45 MB CSV file containing 145 fields and 50,000 records, the reader processed about 30 MB / s. So all in all, it took 1.5 seconds! Machine specifications were P4 3.0 GHz, 1024 MB.
I get nothing from these results. My process takes 10 minutes. Is this because it is not one large stream, but many small files, and is there overhead? Is there anything else I could do?
I feel that the LumenWorks implementation was not faster than mine (I did not test it), not to mention the fact that it handles quotes, escapes, comments, and multi-line fields, none of which I need. I have a very normal comma separated integer format.
Greetings
source
share