Fastest method to run a binary search in a file in C?

For example, let's say I want to find a specific word or number in a file. The content is in sort order (obvious). Since I want to run a binary search in a file, it seems like a waste of time to copy the entire file to an array and then run a binary search ... I effectively made it a linear time algorithm, I have to spend O (n) time on copying the darn file before I can start my search.

Is there a faster way to do this? Maybe something like lseek that works with strings instead of bytes?

If this is not the case, would I rather just do a linear search (assuming I perform a search once throughout my program)?

+3
source share
7 answers

You cannot search by string. This is pretty obvious when you think about it.

But you can sort the binary search in a text file.

What are you doing:

  • Put the file to get the length or search to the end and get the position.
  • Memory map file.
    (This is best, I think, but you can use lseek and read if you need to.)
  • Look for the middle of the file, minus the average line length. Guess.
  • Scan ahead for a new line if you are not at position 0.
  • Read your line and compare it.
  • Repeat for 1/4 or 3/4th, 1/8th, 1/16th, etc.
+6
source

, , " -", .. , , O - . , .

, -:

  • (), .
  • , ,
  • ( ) , ( btree).
+4

, , , ( ) . , - , , , .

, #n. , , 100 ( 100 ) - 200 . , .

+2

, , , / , , , , O (log n).

+1

"lseek", "". , .

, , , , . , , .

, , (, ), .

+1

, , , .

, . , . , .

, , mmap . , strchr . , , . , , , .

, , - Boyer-Moore. . - .

, , , , .

: - readline() fgets(), malloc() . malloc() , , .

0

, , , , . ersatz . , , , - -.

As you noticed, if you are going to read it, you can also linearly search for it when you go. So do it, use the modified Boyer-Moore search when you read it and you will do everything well.

0
source

Source: https://habr.com/ru/post/1722760/


All Articles