As other people have said, the easiest way is to simply download the entire file and use something like qsortto sort it.
If you cannot load so much information into memory at once, another option is to load data in several passes. On the first pass, read the file and download only the lines starting with A. Sort them and find unique strings. For the next pass, load all lines starting with B, sort and find unique lines. Repeat this process for each alphanumeric character the line can begin with. Using this technique, you will only need to load part of the file into memory at a time, and this should not cause any lines to be classified incorrectly.
source
share