I am trying to compare two large datasets from an SQL query. Currently, the SQL query is being executed externally, and the results from each dataset are saved in their own csv file. My small C # console application downloads two text / csv files and compares them for differences and saves the differences in a text file.
This is a very simple application that simply loads all the data from the first file into the arraylist and does .compare () on the arraylist, since each line is read from the second csv file. Then saves entries that do not match.
The application works, but I would like to improve performance. I believe that I can significantly improve performance if I can take advantage of the fact that both files are sorted, but I don’t know the data type in C #, which keeps order and allows me to choose a specific position. Theres a basic array, but I don't know how many items will be in each list. I could have over a million records. Is there a data type that I should look at?
source share