How to write FileTypeDetector for zip archives?

For this package , one of my next steps is to write the Files.probeContentType() series so that the Files.probeContentType() method is smarter than the default one (the default file type detectors only use file name extensions.

As mentioned in the javadoc of the above method, this method relies on instances of FileTypeDetector declared in the META-INF/services file.

I first checked with a simple provider to detect PNG files using the file header:

 public final class PngFileTypeDetector extends FileTypeDetector { private static final byte[] PNG_HEADER = { (byte) 0x89, (byte) 0x50, (byte) 0x4E, (byte) 0x47, (byte) 0x0D, (byte) 0x0A, (byte) 0x1A, (byte) 0x0A }; private static final int PNG_HEADER_SIZE = PNG_HEADER.length; @Override public String probeContentType(final Path path) throws IOException { final byte[] buf = new byte[PNG_HEADER_SIZE]; try ( final InputStream in = Files.newInputStream(path); ) { if (in.read(buf) != PNG_HEADER_SIZE) return null; } return Arrays.equals(buf, PNG_HEADER) ? "image/png" : null; } } 

It works. Now, with a quick look at the API, I thought it would be a good way to determine if the file was a zip code:

 public final class ZipFileTypeDetector extends FileTypeDetector { @Override public String probeContentType(final Path path) throws IOException { // Rely on what the JDK has to offer... try ( final InputStream in = Files.newInputStream(path); final ZipInputStream z = new ZipInputStream(in); ) { z.getNextEntry(); return "application/zip"; } catch (ZipException ignored) { return null; } } } 

The contents of META-INF/services/java.nio.file.spi.FileTypeDetector was as follows:

 com.github.fge.filesystem.ftd.PngFileTypeDetector com.github.fge.filesystem.ftd.ZipFileTypeDetector 

With current tests, this worked; for zip I created an empty zip file, for the PNG test I used this image .

Full test:

 public final class FileTypeDetectorTest { private FileSystem fs; private Path path; @BeforeMethod public void initfs() throws IOException { fs = MemoryFileSystemBuilder.newLinux().build("testfs"); path = fs.getPath("/foo"); } @DataProvider public Iterator<Object[]> samples() { final List<Object[]> list = new ArrayList<>(); String resourcePath; String mimeType; resourcePath = "/ftd/sample.png"; mimeType = "image/png"; list.add(new Object[] { resourcePath, mimeType }); resourcePath = "/ftd/sample.zip"; mimeType = "application/zip"; list.add(new Object[] { resourcePath, mimeType }); return list.iterator(); } @Test(dataProvider = "samples") public void fileTypeDetectionTest(final String resourcePath, final String mimeType) throws IOException { @SuppressWarnings("IOResourceOpenedButNotSafelyClosed") final InputStream in = FileTypeDetectorTest.class.getResourceAsStream(resourcePath); if (in == null) throw new IOException(resourcePath + " not found in classpath"); try ( final InputStream inref = in; ) { Files.copy(inref, path); } assertThat(Files.probeContentType(path)).isEqualTo(mimeType); } @AfterMethod public void closefs() throws IOException { fs.close(); } } 

But...

If I invert the list of implementations in the services file, that is, the file now:

 com.github.fge.filesystem.ftd.ZipFileTypeDetector com.github.fge.filesystem.ftd.PngFileTypeDetector 

then the PNG file is defined as a zip file!

After some debugging, I noticed that:

  • opening PNG as ZipInputStream did not work ...
  • ... and .getNextEntry() returned null!

I would expect at least .getNextEntry() to throw a ZipException .

Why? How can I reliably determine if a zip file is?

Please note: this is for Path s; therefore nothing File unusable.

+6
source share
1 answer

Why wasn’t it?

Well, the JavaDoc for getNextEntry() says that a ZipException or IOException is ZipException ,

if an error has occurred in the ZIP file

if an I / O error has occurred

respectively.

Based on this remarkably useful information (cough), we cannot make any assumptions that it will throw an exception if it encounters an invalid record.

How can I reliably determine if a zip file is?

The ZIP file format specification, which was originally PKZip, is here . While it all reads well :), take a look at section 4; 4.3.16 in particular. It indicates "End of central directory entry", which has all ZIP files (even empty ones).

0
source

Source: https://habr.com/ru/post/986000/


All Articles