Iterable gzip deflate / inflate in Java

Is there a library for gzip-deflating in terms of ByteBuffers hidden on the Internet? Something that allows us to push raw data and then push deflated data? We searched for it, but found only libraries that deal with InputStreams and OutputStreams.

We are tasked with creating gzip filters to deflate a ByteBuffers stream in a pipeline architecture. This is an exhaust architecture in which the last element retrieves data from earlier elements. Our gzip filter deals with ByteBuffers, there is no single Stream object.

We played with adapting the data stream as a kind of InputStream, and then used the GZipOutputStream to satisfy our requirements, but the amount of adapter code is at least annoying.

Post-accept edit : for recording, our architecture is similar to GStreamer, etc.

+4
source share
3 answers

Many thanks to Mark Adler for suggesting an approach that is much better than my original answer.

package stack; import java.io.*; import java.nio.ByteBuffer; import java.nio.channels.FileChannel; import java.util.zip.CRC32; import java.util.zip.Deflater; public class BufferDeflate2 { /** The standard 10 byte GZIP header */ private static final byte[] GZIP_HEADER = new byte[] { 0x1f, (byte) 0x8b, Deflater.DEFLATED, 0, 0, 0, 0, 0, 0, 0 }; /** CRC-32 of uncompressed data. */ private final CRC32 crc = new CRC32(); /** Deflater to deflate data */ private final Deflater deflater = new Deflater(Deflater.BEST_COMPRESSION, true); /** Output buffer building area */ private final ByteArrayOutputStream buffer = new ByteArrayOutputStream(); /** Internal transfer space */ private final byte[] transfer = new byte[1000]; /** The flush mode to use at the end of each buffer */ private final int flushMode; /** * New buffer deflater * * @param syncFlush * if true, all data in buffer can be immediately decompressed * from output buffer */ public BufferDeflate2(boolean syncFlush) { flushMode = syncFlush ? Deflater.SYNC_FLUSH : Deflater.NO_FLUSH; buffer.write(GZIP_HEADER, 0, GZIP_HEADER.length); } /** * Deflate the buffer * * @param in * the buffer to deflate * @return deflated representation of the buffer */ public ByteBuffer deflate(ByteBuffer in) { // convert buffer to bytes byte[] inBytes; int off = in.position(); int len = in.remaining(); if( in.hasArray() ) { inBytes = in.array(); } else { off = 0; inBytes = new byte[len]; in.get(inBytes); } // update CRC and deflater crc.update(inBytes, off, len); deflater.setInput(inBytes, off, len); while( !deflater.needsInput() ) { int r = deflater.deflate(transfer, 0, transfer.length, flushMode); buffer.write(transfer, 0, r); } byte[] outBytes = buffer.toByteArray(); buffer.reset(); return ByteBuffer.wrap(outBytes); } /** * Write the final buffer. This writes any remaining compressed data and the GZIP trailer. * @return the final buffer */ public ByteBuffer doFinal() { // finish deflating deflater.finish(); // write all remaining data int r; do { r = deflater.deflate(transfer, 0, transfer.length, Deflater.FULL_FLUSH); buffer.write(transfer, 0, r); } while( r == transfer.length ); // write GZIP trailer writeInt((int) crc.getValue()); writeInt((int) deflater.getBytesRead()); // reset deflater deflater.reset(); // final output byte[] outBytes = buffer.toByteArray(); buffer.reset(); return ByteBuffer.wrap(outBytes); } /** * Write a 32 bit value in little-endian order * * @param v * the value to write */ private void writeInt(int v) { System.out.println("v="+v); buffer.write(v & 0xff); buffer.write((v >> 8) & 0xff); buffer.write((v >> 16) & 0xff); buffer.write((v >> 24) & 0xff); } /** * For testing. Pass in the name of a file to GZIP compress * @param args * @throws IOException */ public static void main(String[] args) throws IOException { File inFile = new File(args[0]); File outFile = new File(args[0]+".test.gz"); FileChannel inChan = (new FileInputStream(inFile)).getChannel(); FileChannel outChan = (new FileOutputStream(outFile)).getChannel(); BufferDeflate2 def = new BufferDeflate2(false); ByteBuffer buf = ByteBuffer.allocate(500); while( true ) { buf.clear(); int r = inChan.read(buf); if( r==-1 ) break; buf.flip(); ByteBuffer compBuf = def.deflate(buf); outChan.write(compBuf); } ByteBuffer compBuf = def.doFinal(); outChan.write(compBuf); inChan.close(); outChan.close(); } } 
+1
source

I don’t understand the “hidden on the Internet” part, but zlib does gzip compression and decompression in memory. java.util.zip API provides some access to zlib, although it is limited. Due to interface limitations, you cannot request that zlib produce and consume gzip threads directly. However, you can use the nowrap parameter to produce and consume raw deflation data. Then it's easy to collapse your own gzip header and trailer using the CRC32 class in java.util.zip . You can add a fixed 10-byte header, add a four-byte CRC, and then a four-byte uncompressed length (modulo 2 32 ), both in order in order and it’s good for you to switch.

+2
source

Handling ByteBuffers is not complicated. See my sample code below. You need to know how buffers are created. Possible options:

  • Each buffer is compressed independently. I understand that this is not so. You simply convert the buffer to an array of bytes and transfer it to ByteArrayInputStream in GZIPInputStream.
  • Each buffer was completed by SYNC_FLUSH by the author, and thus contains the entire data block in the stream. All data written by the author to the buffer can be read by the reader immediately.
  • Each buffer is part of a gzip stream. There is no guarantee that the reader can read anything from the buffer.

Data created by GZIP should be processed in order. ByteBuffers should be processed in the same order in which they are created.

Code example:

 package stack; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; import java.nio.ByteBuffer; import java.nio.channels.Channels; import java.nio.channels.Pipe; import java.nio.channels.SelectableChannel; import java.util.concurrent.BlockingQueue; import java.util.concurrent.LinkedBlockingQueue; import java.util.concurrent.atomic.AtomicInteger; import java.util.zip.GZIPInputStream; public class BufferDeflate { static AtomicInteger idSrc = new AtomicInteger(1); /** Queue for transferring buffers */ final BlockingQueue<ByteBuffer> buffers = new LinkedBlockingQueue<ByteBuffer>(); /** The entry point for deflated buffers */ final Pipe.SinkChannel bufSink; /** The source for the inflater */ final Pipe.SourceChannel infSource; /** The destination for the inflater */ final Pipe.SinkChannel infSink; /** The source for the outside world */ public final SelectableChannel source; class Relayer extends Thread { public Relayer(int id) { super("BufferRelayer" + id); } public void run() { try { while( true ) { ByteBuffer buf = buffers.take(); if( buf != null ) { bufSink.write(buf); } else { bufSink.close(); break; } } } catch (Exception e) { e.printStackTrace(); } } } class Inflater extends Thread { public Inflater(int id) { super("BufferInflater" + id); } public void run() { try { InputStream in = Channels.newInputStream(infSource); GZIPInputStream gzip = new GZIPInputStream(in); OutputStream out = Channels.newOutputStream(infSink); int ch; while( (ch = gzip.read()) != -1 ) { out.write(ch); } out.close(); } catch (Exception e) { e.printStackTrace(); } } } /** * New buffer inflater */ public BufferDeflate() throws IOException { Pipe pipe = Pipe.open(); bufSink = pipe.sink(); infSource = pipe.source(); pipe = Pipe.open(); infSink = pipe.sink(); source = pipe.source().configureBlocking(false); int id = idSrc.incrementAndGet(); Thread thread = new Relayer(id); thread.setDaemon(true); thread.start(); thread = new Inflater(id); thread.setDaemon(true); thread.start(); } /** * Add the buffer to the stream. A null buffer closes the stream * * @param buf * the buffer to add * @throws IOException */ public void add(ByteBuffer buf) throws IOException { buffers.offer(buf); } } 

Just pass the buffers to the add method and read from the public channel source . The amount of data that can be read from GZIP after processing a given number of bytes cannot be predicted. Therefore, I made the source channel non-blocking, so you can safely read it in the same stream that you add byte buffers.

+1
source

Source: https://habr.com/ru/post/1446089/


All Articles