Finding a string in a pre-processed large text file

I have a data file that contains 100,000 rows, each row contains only two fields, a key and a value, separated by a comma, and all keys are unique . I want to request the key value from this file. Downloading it to the card is out of the question, since it consumes too much memory (the code will work on the built-in device), and I do not want to use DB. What I am doing so far is pre-processing the file on my PC, that is, sorting the lines, and then using binary search, as shown below, in the pre-processed file:

public long findKeyOffset(RandomAccessFile raf, String key)
            throws IOException {
        int blockSize = 8192;
        long fileSize = raf.length();
        long min = 0;
        long max = (long) fileSize / blockSize;
        long mid;
        String line;
        while (max - min > 1) {
            mid = min + (long) ((max - min) / 2);
            raf.seek(mid * blockSize);
            if (mid > 0)
                line = raf.readLine(); // probably a partial line
            line = raf.readLine();
            String[] parts = line.split(",");
            if (key.compareTo(parts[0]) > 0) {
                min = mid;
            } else {
                max = mid;
            }
        }
        // find the right line
        min = min * blockSize;
        raf.seek(min);
        if (min > 0)
            line = raf.readLine();
        while (true) {
            min = raf.getFilePointer();
            line = raf.readLine();
            if (line == null)
                break;
            String[] parts = line.split(",");
            if (line.compareTo(parts[0]) >= 0)
                break;
        }
        raf.seek(min);
        return min;
    }

I think there are better solutions than this. Can someone give me some enlightenment?

+4
3

, ( ).

: .

, , .

, -, .

O (1) .


, , , . : 3 , , , 3 . (aka O (3) aka ) .

+3

:

  • n - , , .
  • k < n - ( ).
  • k ( n/k ). F1... Fk. , F1... Fk , .
  • P k , Fi.
  • P O (logk), / (F1... Fk), . / .
  • k , Fi (n/k) , HashMap O (1). , O (log (n/k)).

O (logk) + O (log (n/k)), O (logn), .

k, , / Fi HashMap , . k sqrt (n), O (log (sqrt (n))), P. k, P Fi HashMap O (1), .

+2

How about this?

#include <iostream>
#include <fstream>
#include <boost/algorithm/string.hpp>
#include <vector>

using namespace std;

int main(int argc, char *argv[])
{
    ifstream f(argv[1],ios::ate);
    if (!f.is_open())
        return 0;
    string key(argv[2]),value;

    int max = f.tellg();
    int min = 0,mid = 0;
    string s;
    while(max-min>1)
    {
        mid = min + (max - min )/2;
        f.seekg(mid);
        f >> s;
        std::vector<std::string> strs;

        if (!f)
        {
            break;
        }
        if (mid)
        {
            f >> s;
        }
        boost::split(strs, s, boost::is_any_of(","));
        int comp = key.compare(strs[0]);
        if ( comp < 0)
        {
            max = mid;
        }
        else if (comp > 0)
        {
            min = mid;
        }
        else
        {
            value = strs[1];
            break;
        }
    }
    cout<<"key "<<key;
    if (!value.empty())
    {
        cout<<" found! value = "<<value<<endl;
    }
    else
    {
        cout<<" not found..."<<endl;
    }

    f.close();
    return 0;
}
0
source

Source: https://habr.com/ru/post/1686410/


All Articles