Uniqueness of lines in a large file

Question

Uniqueness of lines in a large file

In C, I want to process a file containing 10 ⁸ 16-digit alphanumeric strings and determine if each of them is unique in the file. How can i do this?

+3

c file-io

venkat Aug 13 '10 at 21:19

source share

5 answers

bta · Answer 1 · 2010-08-13T21:39:54+0000

As other people have said, the easiest way is to simply download the entire file and use something like qsortto sort it.

If you cannot load so much information into memory at once, another option is to load data in several passes. On the first pass, read the file and download only the lines starting with A. Sort them and find unique strings. For the next pass, load all lines starting with B, sort and find unique lines. Repeat this process for each alphanumeric character the line can begin with. Using this technique, you will only need to load part of the file into memory at a time, and this should not cause any lines to be classified incorrectly.

Jerry Coffin · Answer 2 · 2010-08-13T21:29:55+0000

, ~ 16 , - ( - ) .

, C, (- ), .

Mike · Answer 3 · 2010-08-13T21:48:32+0000

( ) , . , , .

Codo · Answer 4 · 2010-08-13T21:33:23+0000

.

, qsort C , , , , .

user411313 · Answer 5 · 2010-08-13T21:37:49+0000

Take a library with typing / display functions, for example. see link text

Uniqueness of lines in a large file

More articles: