Adding support for opening *any* archived single file

Design:

Some time ago, I shared a potential design to read archived file in ParaView discourse.

Well, this idea never went very far as it required a complex implementation in a third party that only partially existed at the time.

But since this is a topic that comes up often.

I wanted to find a potential solution, and there is actually something that changes in the last few years, streaming support has been added in VTK and implemented in many readers:

While streaming support is in itself a good thing and let VTK be used in many other context than reading data from disk, it also improve a lot what we can do in terms of interoperability with other software and within VTK itself.

Indeed, we could leverage stream support and provide other kind of streams, from different, such as compressed data stream, thus we could imagine doing this:

Single GZip file:

vtkNew<vtkGZipCompressedResourceStream> stream;
stream->Open("/path/to/file.obj.gz");

vtkNew<vtkOBJReader> reader;
reader->SetStream(stream):
reader->Read()

Tarbal:

vtkNew<vtkTGZCompressedResourceStream> stream;
stream->Open("/path/to/archive.tgz", "file.obj");

vtkNew<vtkOBJReader> reader;
reader->SetStream(stream):
reader->Read()

ZIP archive:

vtkNew<vtkZIPCompressedResourceStream> stream;
stream->Open("/path/to/archive.zip", "file.obj");

vtkNew<vtkOBJReader> reader;
reader->SetStream(stream):
reader->Read()

For ZIP and TGZ, the second arg let us select a file from the archive.

Details:

As always, the devil is in the details.

  1. Reading the whole file in memory

Some readers stream implementation require the whole data to be available in momery (as in void*, size_t. Unless such implementation is rewritten to support reading from a proper stream, then the while file will necessarly be copied uncompressed in memory before actually being read.

It will work, but will definitely be memory extensive and show limitations for large files.

  1. Seek support

Most readers implementing stream support use the Seek method to move around the stream, which can be very useful, and, at the moment, this is supported by all types of streams. However, it is possible that certains compression algorithm won’t let us properly implement Seek, which will require to ensure all readers support a Seek-less version, which may not be trivial to implement.

  1. Slice reading

Most readers, as we expect, read the file part by part, we can expect that some compression may not support that and may require to decompress the whole file before being able to read any data

Implementation:

The using a third party to read the compressed file is a must obviously. zlib would be the classic choice but libzip seems to be well placed because it supports many decompression algorithm and allows to read a slice and to seek.

Of course, other more custom compression may require other implementations and third parties.

What are your thoughts on this ?

1 Like

For archives, I think it would be good to support them in the vtkURILoader, as this would allow resolving and loading file indirectly, that could be useful for importers, (I don’t know if any importer supports the vtkURILoader at this time) and readers like the vtkGLTFReader:

vtkNew<vtkURILoader> loader;
loader->SetBaseFileName("gltf.zip");

auto stream = loader->Load("dataset.json"); // load main file manually from zip

vtkNew<vtkGLTFReader> reader;
reader->SetURILoader(loader); // other relative files will be resolved and loaded from the zip
reader->SetStream(stream);

To support them in the URI loader there are different approach, either the base file name must be a zip (last component), or do something higher level by resolving zip in full paths, but that would be tricky.


About Seek support, note that the libzip indicates that compressed archives do not support seeking, and uncompressed ZIP are uncommon.

Indeed, that would be interesting in the context of glTF and we should add URI loader support to other readers/importers as well.

the libzip indicates that compressed archives do not support seeking

Unfortunate, where did you find that info ?

In the link you gave:

The zip_fseek() function seeks to the specified offset relative to whence, just like fseek(3).
zip_fseek only works on uncompressed (stored), unencrypted data. When called on compressed or encrypted data it will return an error.

Next time I will try reading :slight_smile:

Anyway, it indeed shows that the Seek issue I highlight may be critical, unless we find another third party that can indeed Seek.