CloudFlare zlib?

@lassoan wanted a zlib variant that allowed random access. mgzip is a clever Python implementation, and the method could be adapted to other languages. This takes advantage of the fact that gz allows an extra field and allows multiple compressed streams to be concatenated into a single file. The benefits of this approach are:

  1. mgzip allows parallel compression.
  2. mgzip is able to decompress files it created in parallel.
  3. files created by mgzip can be decompressed by gz compatible tools (though in serial).
  4. one can quickly skip compressed chunks of a file created by mgzip, allowing random access.

There is a slight tradeoff regarding how many chunks to break a file into. More chunks means slightly poorer compression but finer grained random access.

I created the Python script e_test_mgzip.py for my compression benchmark.

1 Like