Java memory based binary search

I am currently trying to find the fastest way to find a 2GB binary in java. This is different from my usual problems, as this file is already displayed on the Linux file system with mmap.

The question is a binary, and I need to find it for a fixed four-byte string; AXL0

Usually, in small files, I simply buffer it, convert it to a string, and then re-express it. However, since this file is already mapped onto the card and quite large, the idea of ​​re-buffering seems wrong, and converting it to a 2 GB line seems even more erroneous ...

After some reading, I came across Java packages NIOalong with FileChannelsand MappedByteBuffers, but I'm not quite sure how to configure them.

I just need to scan the file, from zero to the last byte in the file, and find each instance of the four byte lines.

If anyone could offer any advice or input, I would really appreciate it.

Thank.

+4
source share
1 answer

Looking at the problem in an abstract way, you can do nothing better than linear search.

It follows that it hardly matters which API you use to actually perform the search, for simplicity I would just go with a buffered InputStream, which can be implemented by the agnostic of the actual data source and has no built-in limitation; it works on files larger than 2 GB .

(: ), ( -, , , SSD, , - ).

: KISS ,

public class ScanForByteCombo {

    public static List<Long> scanFor(InputStream is, int needle) throws IOException {
        List<Long> foundOffsets = new ArrayList<>();
        InputStream bs = new BufferedInputStream(is, 0x10000);
        int data = 0;
        int b;
        long offset = 0;
        while ((b = bs.read()) != -1) {
            data = (data << 8) | b;
            if (data == needle) 
                foundOffsets.add(offset);
            ++offset;
        }
        return foundOffsets;
    }

    public static void main(String[] argv) {

        int needle = ('A' << 24) | ('X' << 16) | ('F' << 8) | '0';

        long start = System.currentTimeMillis();
        try (InputStream is = new FileInputStream("your file")) {
            List<Long> found = scanFor(is, needle);
            System.out.println(found);
        } catch (Exception e) {
            e.printStackTrace();
        }
        System.out.println("scan took " + (System.currentTimeMillis() - start) + "ms. Acceptable?");
    }

}

, , , , , .

+3

Source: https://habr.com/ru/post/1583783/


All Articles