Reading scattered data from multiple files in java

I am working on a reader / writer for DNG / TIFF files. Because there are several options for working with files in general ( FileInputStream, FileChannel, RandomAccessFile), I wonder what kind of strategy would fit my needs.

A DNG / TIFF file is a composition:

  • some (5-20) small blocks (from several tens to one hundred bytes)
  • very few (1-3) large continuous blocks of image data (up to 100 worlds)
  • several (possibly 20-50) very small blocks (4-16 bytes)

The total file size varies from 15 MiB (compressed 14-bit raw data) to about 100 MiB (uncompressed floating point data). The number of files to process is 50-400.

There are two usage patterns:

  • Read all metadata from all files (all except image data)
  • Read all image data from all files

I am currently using FileChanneland performing map()to get MappedByteBuffercovering the entire file. This seems pretty wasteful if I'm just interested in reading metadata. Another problem is freeing up mapped memory: when I pass slices of the mapped buffer around parsing, etc., the Base MappedByteBufferwill not be compiled.

Now I decided to copy smaller chunks FileChannelusing several methods read()and display only large areas of raw data. The disadvantage is that reading a single value seems extremely difficult because there isn’t anything readShort()like that:

short readShort(long offset) throws IOException, InterruptedException {
    return read(offset, Short.BYTES).getShort();
}

ByteBuffer read(long offset, long byteCount) throws IOException, InterruptedException {
    ByteBuffer buffer = ByteBuffer.allocate(Math.toIntExact(byteCount));
    buffer.order(GenericTiffFileReader.this.byteOrder);
    GenericTiffFileReader.this.readInto(buffer, offset);
    return buffer;
}

private void readInto(ByteBuffer buffer, long startOffset)
        throws IOException, InterruptedException {

    long offset = startOffset;
    while (buffer.hasRemaining()) {
        int bytesRead = this.channel.read(buffer, offset);
        switch (bytesRead) {
        case 0:
            Thread.sleep(10);
            break;
        case -1:
            throw new EOFException("unexpected end of file");
        default:
            offset += bytesRead;
        }
    }
    buffer.flip();
}

RandomAccessFile , readShort() readFully(), .

, ? 100 MiB ?

+4
1

, , , :

  • echo 3 > /proc/sys/vm/drop_caches
  • 8 : 1000 8 ( 20 20 MiB 1 GiB).

.

1, FileChannel ByteBuffer s:

private static long method1(Path file, long dummyUsage) throws IOException, Error {
    try (FileChannel channel = FileChannel.open(file, StandardOpenOption.READ)) {

        for (int i = 0; i < 1000; i++) {
            ByteBuffer dst = ByteBuffer.allocate(8);

            if (channel.position(i * 10000).read(dst) != dst.capacity())
                throw new Error("partial read");
            dst.flip();
            dummyUsage += dst.order(ByteOrder.LITTLE_ENDIAN).getInt();
            dummyUsage += dst.order(ByteOrder.BIG_ENDIAN).getInt();
        }
    }
    return dummyUsage;
}

:

1. 3422 ms
2. 56 ms
3. 24 ms
4. 24 ms
5. 27 ms
6. 25 ms
7. 23 ms
8. 23 ms

2, MappedByteBuffer, :

private static long method2(Path file, long dummyUsage) throws IOException {

    final MappedByteBuffer buffer;
    try (FileChannel channel = FileChannel.open(file, StandardOpenOption.READ)) {
        buffer = channel.map(MapMode.READ_ONLY, 0L, Files.size(file));
    }
    for (int i = 0; i < 1000; i++) {
        dummyUsage += buffer.order(ByteOrder.LITTLE_ENDIAN).getInt(i * 10000);
        dummyUsage += buffer.order(ByteOrder.BIG_ENDIAN).getInt(i * 10000 + 4);
    }
    return dummyUsage;
}

:

1. 749 ms
2. 21 ms
3. 17 ms
4. 16 ms
5. 18 ms
6. 13 ms
7. 15 ms
8. 17 ms

3, RandomAccessFile:

private static long method3(Path file, long dummyUsage) throws IOException {

    try (RandomAccessFile raf = new RandomAccessFile(file.toFile(), "r")) {
        for (int i = 0; i < 1000; i++) {

            raf.seek(i * 10000);
            dummyUsage += Integer.reverseBytes(raf.readInt());
            raf.seek(i * 10000 + 4);
            dummyUsage += raf.readInt();
        }
    }
    return dummyUsage;
}

:

1. 3479 ms
2. 104 ms
3. 81 ms
4. 84 ms
5. 78 ms
6. 81 ms
7. 81 ms
8. 81 ms

: MappedByteBuffer - - (340 140 ), , -, . , . RandomAccessFile .

: A MappedByteBuffer, , , , .

0

Source: https://habr.com/ru/post/1670338/


All Articles