Design:
Some time ago, I shared a potential design to read archived file in ParaView discourse.
Well, this idea never went very far as it required a complex implementation in a third party that only partially existed at the time.
But since this is a topic that comes up often.
I wanted to find a potential solution, and there is actually something that changes in the last few years, streaming support has been added in VTK and implemented in many readers:
- https://gitlab.kitware.com/vtk/vtk/-/blob/release/Documentation/release/9.6/add-image-reader2-stream-support.md?ref_type=heads
- https://gitlab.kitware.com/vtk/vtk/-/blob/release/Documentation/release/9.6/add-stream-support.md?ref_type=heads
- https://gitlab.kitware.com/vtk/vtk/-/blob/master/Documentation/release/dev/add-stream-support.md?ref_type=heads
While streaming support is in itself a good thing and let VTK be used in many other context than reading data from disk, it also improve a lot what we can do in terms of interoperability with other software and within VTK itself.
Indeed, we could leverage stream support and provide other kind of streams, from different, such as compressed data stream, thus we could imagine doing this:
Single GZip file:
vtkNew<vtkGZipCompressedResourceStream> stream;
stream->Open("/path/to/file.obj.gz");
vtkNew<vtkOBJReader> reader;
reader->SetStream(stream):
reader->Read()
Tarbal:
vtkNew<vtkTGZCompressedResourceStream> stream;
stream->Open("/path/to/archive.tgz", "file.obj");
vtkNew<vtkOBJReader> reader;
reader->SetStream(stream):
reader->Read()
ZIP archive:
vtkNew<vtkZIPCompressedResourceStream> stream;
stream->Open("/path/to/archive.zip", "file.obj");
vtkNew<vtkOBJReader> reader;
reader->SetStream(stream):
reader->Read()
For ZIP and TGZ, the second arg let us select a file from the archive.
Details:
As always, the devil is in the details.
- Reading the whole file in memory
Some readers stream implementation require the whole data to be available in momery (as in void*, size_t. Unless such implementation is rewritten to support reading from a proper stream, then the while file will necessarly be copied uncompressed in memory before actually being read.
It will work, but will definitely be memory extensive and show limitations for large files.
- Seek support
Most readers implementing stream support use the Seek method to move around the stream, which can be very useful, and, at the moment, this is supported by all types of streams. However, it is possible that certains compression algorithm won’t let us properly implement Seek, which will require to ensure all readers support a Seek-less version, which may not be trivial to implement.
- Slice reading
Most readers, as we expect, read the file part by part, we can expect that some compression may not support that and may require to decompress the whole file before being able to read any data
Implementation:
The using a third party to read the compressed file is a must obviously. zlib would be the classic choice but libzip seems to be well placed because it supports many decompression algorithm and allows to read a slice and to seek.
Of course, other more custom compression may require other implementations and third parties.
What are your thoughts on this ?