Java.util.zip - ZipInputStream vs ZipFile

I have some general questions regarding the java.util.zip library. We mainly import and export many small components. Previously, these components were imported and exported using one large file, for example:

 <component-type-a id="1"/> <component-type-a id="2"/> <component-type-a id="N"/> <component-type-b id="1"/> <component-type-b id="2"/> <component-type-b id="N"/> 

Note that the order of the components during import matters.

Now each component should take its own file, which should be an external version, QA-ed, bla, bla. We decided that the output of our export should be a zip file (with all these files), and the input of our import should be a similar zip file. We do not want to explode the zip code in our system. We do not want to open separate streams for each of the small files. My current questions are:

Q1. Can ZipInputStream guarantee that zip records (small files) will be read in the same order as they were inserted by our export, which uses ZipOutputStream ? I assume that reading is something like:

 ZipInputStream zis = new ZipInputStream(new BufferedInputStream(fis)); ZipEntry entry; while((entry = zis.getNextEntry()) != null) { //read from zis until available } 

I know that the central zip directory is placed at the end of the zip file, but, nevertheless, the entries in the file inside are in sequential order. I also know that relying on order is an ugly idea, but I just want to keep all the facts in mind.

Q2. If I use ZipFile (which I prefer), what is the performance impact of getInputStream() call hundreds of times? Will it be much slower than ZipInputStream solution? The zip code is opened only once, and the ZipFile supported by RandomAccessFile - is this correct? I assume that reading is something like:

 ZipFile zipfile = new ZipFile(argv[0]); Enumeration e = zipfile.entries();//TODO: assure the order of the entries while(e.hasMoreElements()) { entry = (ZipEntry) e.nextElement(); is = zipfile.getInputStream(entry)); } 

Q3. Are input streams obtained from the same thread-safe ZipFile (for example, I can read different records in different streams at the same time)? Any penalties for performance?

Thank you for your responses!

+5
source share
3 answers

Q1: Yes, the order will be the same as the entries were added.

Q2: note that due to the structure of archives of zip archives and compression, none of the solutions is exactly streaming; they all perform some level of buffering. And if you check out the JDK sources, implementations use most of the code. There is no real random access inside the content, although the index allows you to find pieces matching records. Therefore, I think that there should be no significant differences in performance; moreover, the OS will do disk block caching anyway. You can simply test the performance to verify this with a simple test case.

Q3: I would not count on it; and most likely this is not so. If you really think that parallel access will help (mainly because decompression is related to the CPU, so it can help), I would try to read the entire file in memory, expose it via ByteArrayInputStream and build several independent readers.

+3
source

I measured that simply listing files with ZipInputStream is 8 times slower than with ZipFile .

  long t = System.nanoTime(); ZipFile zip = new ZipFile(jarFile); Enumeration<? extends ZipEntry> entries = zip.entries(); while (entries.hasMoreElements()) { ZipEntry entry = entries.nextElement(); String filename = entry.getName(); if (!filename.startsWith(JAR_TEXTURE_PATH)) continue; textureFiles.add(filename); } zip.close(); System.out.println((System.nanoTime() - t) / 1e9); 

and

  long t = System.nanoTime(); ZipInputStream zip = new ZipInputStream(new FileInputStream(jarFile)); ZipEntry entry; while ((entry = zip.getNextEntry()) != null) { String filename = entry.getName(); if (!filename.startsWith(JAR_TEXTURE_PATH)) continue; textureFiles.add(filename); } zip.close(); System.out.println((System.nanoTime() - t) / 1e9); 

(Do not run them in the same class. Make two different classes and run them separately)

+1
source

Regarding Q3, the experience of JENKINS-14362 suggests that zlib is not thread safe even when working with unbound threads, i.e. has some incorrect general static state. Not proven, just a warning.

0
source

Source: https://habr.com/ru/post/892047/


All Articles