Traditional IO versus memory

Question

Traditional IO versus memory

I am trying to illustrate the performance difference between traditional IOs and in-memory mapped files in java for students. I found an example somewhere on the Internet, but not everything is clear to me, I don’t even think that all steps are nececery. I read a lot about this here and there, but I'm not sure about the correct implementation of either one or the other.

The code I'm trying to understand is:

public class FileCopy{ public static void main(String args[]){ if (args.length < 1){ System.out.println(" Wrong usage!"); System.out.println(" Correct usage is : java FileCopy <large file with full path>"); System.exit(0); } String inFileName = args[0]; File inFile = new File(inFileName); if (inFile.exists() != true){ System.out.println(inFileName + " does not exist!"); System.exit(0); } try{ new FileCopy().memoryMappedCopy(inFileName, inFileName+".new" ); new FileCopy().customBufferedCopy(inFileName, inFileName+".new1"); }catch(FileNotFoundException fne){ fne.printStackTrace(); }catch(IOException ioe){ ioe.printStackTrace(); }catch (Exception e){ e.printStackTrace(); } } public void memoryMappedCopy(String fromFile, String toFile ) throws Exception{ long timeIn = new Date().getTime(); // read input file RandomAccessFile rafIn = new RandomAccessFile(fromFile, "rw"); FileChannel fcIn = rafIn.getChannel(); ByteBuffer byteBuffIn = fcIn.map(FileChannel.MapMode.READ_WRITE, 0,(int) fcIn.size()); fcIn.read(byteBuffIn); byteBuffIn.flip(); RandomAccessFile rafOut = new RandomAccessFile(toFile, "rw"); FileChannel fcOut = rafOut.getChannel(); ByteBuffer writeMap = fcOut.map(FileChannel.MapMode.READ_WRITE,0,(int) fcIn.size()); writeMap.put(byteBuffIn); long timeOut = new Date().getTime(); System.out.println("Memory mapped copy Time for a file of size :" + (int) fcIn.size() +" is "+(timeOut-timeIn)); fcOut.close(); fcIn.close(); } static final int CHUNK_SIZE = 100000; static final char[] inChars = new char[CHUNK_SIZE]; public static void customBufferedCopy(String fromFile, String toFile) throws IOException{ long timeIn = new Date().getTime(); Reader in = new FileReader(fromFile); Writer out = new FileWriter(toFile); while (true) { synchronized (inChars) { int amountRead = in.read(inChars); if (amountRead == -1) { break; } out.write(inChars, 0, amountRead); } } long timeOut = new Date().getTime(); System.out.println("Custom buffered copy Time for a file of size :" + (int) new File(fromFile).length() +" is "+(timeOut-timeIn)); in.close(); out.close(); } }

When exactly to use nececary RandomAccessFile ? Here it is used to read and write to memoryMappedCopy , is it really just to copy a file at all? Or is it part of memory mapping?

In customBufferedCopy , why is synchronized used here?

I also found another example that should test performance between 2:

 public class MappedIO { private static int numOfInts = 4000000; private static int numOfUbuffInts = 200000; private abstract static class Tester { private String name; public Tester(String name) { this.name = name; } public long runTest() { System.out.print(name + ": "); try { long startTime = System.currentTimeMillis(); test(); long endTime = System.currentTimeMillis(); return (endTime - startTime); } catch (IOException e) { throw new RuntimeException(e); } } public abstract void test() throws IOException; } private static Tester[] tests = { new Tester("Stream Write") { public void test() throws IOException { DataOutputStream dos = new DataOutputStream( new BufferedOutputStream( new FileOutputStream(new File("temp.tmp")))); for(int i = 0; i < numOfInts; i++) dos.writeInt(i); dos.close(); } }, new Tester("Mapped Write") { public void test() throws IOException { FileChannel fc = new RandomAccessFile("temp.tmp", "rw") .getChannel(); IntBuffer ib = fc.map( FileChannel.MapMode.READ_WRITE, 0, fc.size()) .asIntBuffer(); for(int i = 0; i < numOfInts; i++) ib.put(i); fc.close(); } }, new Tester("Stream Read") { public void test() throws IOException { DataInputStream dis = new DataInputStream( new BufferedInputStream( new FileInputStream("temp.tmp"))); for(int i = 0; i < numOfInts; i++) dis.readInt(); dis.close(); } }, new Tester("Mapped Read") { public void test() throws IOException { FileChannel fc = new FileInputStream( new File("temp.tmp")).getChannel(); IntBuffer ib = fc.map( FileChannel.MapMode.READ_ONLY, 0, fc.size()) .asIntBuffer(); while(ib.hasRemaining()) ib.get(); fc.close(); } }, new Tester("Stream Read/Write") { public void test() throws IOException { RandomAccessFile raf = new RandomAccessFile( new File("temp.tmp"), "rw"); raf.writeInt(1); for(int i = 0; i < numOfUbuffInts; i++) { raf.seek(raf.length() - 4); raf.writeInt(raf.readInt()); } raf.close(); } }, new Tester("Mapped Read/Write") { public void test() throws IOException { FileChannel fc = new RandomAccessFile( new File("temp.tmp"), "rw").getChannel(); IntBuffer ib = fc.map( FileChannel.MapMode.READ_WRITE, 0, fc.size()) .asIntBuffer(); ib.put(0); for(int i = 1; i < numOfUbuffInts; i++) ib.put(ib.get(i - 1)); fc.close(); } } }; public static void main(String[] args) { for(int i = 0; i < tests.length; i++) System.out.println(tests[i].runTest()); } }

I more or less understand what is happening, my conclusion is as follows:

 Stream Write: 653 Mapped Write: 51 Stream Read: 651 Mapped Read: 40 Stream Read/Write: 14481 Mapped Read/Write: 6

What makes Stream Read / Write so incredibly long? And as a read / write test, it seems to me that it makes no sense to read the same whole again and again (if I understand what is happening in Stream Read/Write ) It would not be better to read int from a previously written file and just read and write ints in one place? Is there a better way to illustrate this?

I get a little puzzled about many things, and I just can't get the whole picture.

+4

java file-io memory-mapped-files

Senne Apr 7 '10 at 15:04

source share

3 answers

Kevin brock · Answer 1 · 2010-04-07T21:24:16+0000

What I see with a single read / write stream standard:

This is actually not streaming I / O, but a search for a specific location in a file. This is not buffered, so all I / O must be completed from disk (other threads use buffered I / O, so it is actually read / write in large blocks, then ints are read or written to the memory area).
It looks for the end - 4 bytes, so it reads the last int and writes a new int. The file continues to grow in length by one iteration. This really does not add much time to the cost (although it does show that the author of this test either misunderstood or was not careful).

This explains the very high cost of this particular test.

You asked:

Wouldn't it be better to read int from a previously written file and just read and write ints in the same place?

This is what the author, I think, tried to do with the last two standards, but this is not what they got. Using RandomAccessFile to read and write the same place in the file, you will need to put a search before reading and writing:

 raf.seek(raf.length() - 4); int val = raf.readInt(); raf.seek(raf.length() - 4); raf.writeInt(val);

This demonstrates one advantage of memory-mapped I / O, as you can simply use the same memory address to access the same bits of the file, rather than perform an additional search before each call.

By the way, your first class CHUNK_SIZE test may have problems, since CHUNK_SIZE not even a multiple of the file system block size. It is often useful to use multiples of 1024 and 8192, which were shown as a good sweet spot for most applications (and the reason Java BufferedInputStream and BufferedOutputStream use this value for default buffer sizes). The OS will need to read an additional block to satisfy read requests that are not at the boundaries of the blocks. Subsequent reads (streams) re-read the same block, possibly some full blocks, and then additional blocks again. The memory mapped by I / O always physically reads and writes to blocks, since the actual I / O operations are processed by the OS memory manager, which will use its page size. The page size is always optimized for the correct display of files.

In this example, a memory mapping test reads everything into a memory buffer and then writes everything back. These two tests are really poorly written to compare the two cases. memmoryMappedCopy should read and write in the same block size as customBufferedCopy .

EDIT . There may be even more errors in these test classes. Due to your comment on another answer, I looked again at first grade.

The customBufferedCopy method is static and uses a static buffer. For this type of test, a buffer must be defined inside the method. Then he will not need to use synchronized (although in this context he is not needed for these tests anyway). This static method is called as a regular method, which is bad programming practice (i.e. use FileCopy.customBufferedCopy(...) instead of new FileCopy().customBufferedCopy(...) ).

If you really ran this test from multiple threads, using this buffer would be controversial, and the benchmark would not be just about file I / O, so it would be unfair to compare the results of the two test methods.

Seidr · Answer 2 · 2010-04-07T16:01:37+0000

1) Does this sound like questions your students are asking, and not vice versa?

2) The reason two methods are used is to demonstrate different ways of copying a file. I would venture to suggest that the first method (RamdomAccessFile) creates a version of the file in RAM, and then copies it to the new version on disk, and that the second method (customBufferedCop) is read directly from disk.

3) I'm not sure, but I think that synchronization is used to ensure that multiple instances of the same class are not recorded at the same time.

4) Regarding the last question, I need to go - so I hope someone else can help you with this.

Seriously, although this sounds exactly the questions that the teacher should teach his students. If you don’t have the opportunity to explore such simple things yourself, what example are you setting up your students? & L; / & bombastic GT;

Senne · Answer 3 · 2010-04-09T09:50:50+0000

Thanks for looking at this. I will consider the first examples later, at the moment my professor asked to rewrite 2 tests (Stream and associated read / write)
They generate random ints, first read the index (generated int) and check if the int in this index is equal to the generated int, if it is not equal, the generated int is written by its index. He thought this could lead to a better test, more using RandomAccessFile , does that make sense?

However, I have some problems, first of all I don’t know how to use the buffer with the read / write stream, when I use RandomAccessFile , I found a lot about byte[] buffers using the array, but I'm not sure how to correctly use.
My code so far for this test:

  new Tester("Stream Read/Write") { public void test() throws IOException { RandomAccessFile raf = new RandomAccessFile(new File("temp.tmp"), "rw"); raf.seek(numOfUbuffInts*4); raf.writeInt(numOfUbuffInts); for (int i = 0; i < numOfUbuffInts; i++) { int getal = (int) (1 + Math.random() * numOfUbuffInts); raf.seek(getal*4); if (raf.readInt() != getal) { raf.seek(getal*4); raf.writeInt(getal); } } raf.close(); } },

So this is still not buffered.

The second test I did is the following:

  new Tester("Mapped Read/Write") { public void test() throws IOException { RandomAccessFile raf = new RandomAccessFile(new File("temp.tmp"), "rw"); raf.seek(numOfUbuffInts*4); raf.writeInt(numOfUbuffInts); FileChannel fc = raf.getChannel(); IntBuffer ib = fc.map(FileChannel.MapMode.READ_WRITE, 0, fc.size()).asIntBuffer(); for(int i = 1; i < numOfUbuffInts; i++) { int getal = (int) (1 + Math.random() * numOfUbuffInts); if (ib.get(getal) != getal) { ib.put(getal, getal); } } fc.close(); } }

For small numbers, numOfUbuffInts seems to be fast, for large numbers (20,000,000+) it takes age. I just tried some things, but I'm not sure I'm on the right track.

Traditional IO versus memory

More articles: