Would compressed VTK arrays be a good idea?

jfausty · August 19, 2022, 10:58am

Hi all,

I am currently studying the feasibility of including implicit arrays (i.e. arrays that do not have allocated data in memory but that are instead implicit mappings/functions behind the scenes) into VTK. These new types of arrays would leverage the existing vtkGenericDataArray API. The implicit arrays could, in part, be used to implement compression of arrays in memory (also in VTK).

The idea is to transform a classic vtkDataArray into a vtkImplicitArray where the data would be suitably compressed and the implicit map would effectively do some decompression on read and compression on write. The trade-off is less memory usage for more compute time on access and write operations.

With the help of some colleagues, we came up with the following specifications (in order of importance):

Lower memory usage
Bijective compression/decompression (lossless)
Random access (O(1))
User configurable (i.e. choice of compressor, level of compression, etc.)
Modifiable in compressed state

During some discussions, it became apparent that there are some real pros and cons to this approach (potential memory gains, adding indirection to save memory will impact performance, cost of maintenance of this type of development, what kind of compression to implement, etc.). As such, we wanted to field this question to the VTK community to get more opinions.

Is it worth including this into VTK? Would anyone be interested in using/maintaining this type of structure in the future? Anybody know of previous efforts/discussions in this direction? Does anyone have some strong opinions?

Thanks in advance!

will.schroeder · August 19, 2022, 11:16am

This idea has been discussed on several occasions (hopefully we can dig up some past discussions). There are certainly many merits to this idea, but I am wondering if you’ve thought how this would fit into array dispatch and related mechanisms. If not done carefully, this could result in significant template bloat, increased compile times, and breakages where methods like GetPointer() are used. As a point of reference, the SOA data arrays are still not working correctly and cause problems in filters, and these have been around for years. We are really good at coming up with great ideas but our follow through is sometimes lacking

So I guess what I am saying is that the specifications above while a great start need to be expanded to include addressing implementation impacts on VTK. Plus a realistic plan to carry the work to a reasonable point of completion.

jfausty · August 19, 2022, 12:23pm

Hi @will.schroeder,

Thanks for the comment and point heartily accepted.

We are still in the “architecture” stage of this project and, unfortunately, I don’t have very many solid answers to give you. I can tell you that we’ve been thinking about how to implement iterators and ranges for implicit arrays which we think might be difficult but doable.

The goal of this thread is also to bring to light things that we might not think about right off the bat or benefit from other people’s previous experiences before getting into the implementation. Your comment is spot on here in that regard.

So what I can take away from your comment (if I may paraphrase) is:

Ensure vtkArrayDispatch (and associated mechanisms) compatibility
Be aware of template bloat and actively bound compile times
Work-out the structure of the development ahead of time in a systematic way
Provide ample testing from the get go
Schedule significant time for extensive bench-marking in the project (not only for performance testing but also as an convincing argument to actually use the structures)

will.schroeder · August 19, 2022, 12:27pm

That’s a great start and I’m sure others will have suggestions. It looks like you have your hands full

dgobbi · August 19, 2022, 3:37pm

Look at the vtkArray/vtkDenseArray/vtkSparseArray implementations. They were developed for Infovis, I think, and kept separate from vtkDataArray. They’re not too closely related to what you’re doing now, but they’re still important as a previous attempt to change the way VTK stores data.

The biggest issue with adding new vtkGenericDataArray types with new memory access schemes is that multiple dispatch invariably causes bloat (a lot of bloat, not just a little bit). So the new arrays would have to be a compile-time configuration option, like the scaled SOA arrays.

lassoan · August 20, 2022, 9:35pm

@jfausty What would be your main goal?

In general, if you want to deal with large data sets then compression would just make everything unbearably slow, as with lossless compression memory usage is not reduced significantly (even if you achieve a reduction of 2-3x, it will not help much) and data access time may be increased by magnitudes.

Instead, out-of-core processing (a.k.a. streaming; so that you don’t need to keep the entire data set in memory) and level-of-detail techniques (ability to access lower level detail representation of the data for quick visualization) would be much more effective for big data processing and visualization. VTK has some support for both techniques.

jfausty · August 22, 2022, 8:06am

Hi @lassoan,

Thanks for the feedback.

The goal with the project is to see if we can be more memory efficient using compression techniques while still being competitive in the “access time” department. The goal of this thread is to pool the experiences and opinions of the community to direct the structuring of this development or to find the silver bullet that convinces everyone that it should not be pursued.

I see where you’re coming from. Why allocate resources on this idea when they might be better spent on improving existing functionalities of VTK?

So far, we have reasons to believe that this could be implemented in a reasonable time frame (considering it as a specific application of more generic implicit arrays). Additionally, it might offer more space/time tradeoff possibilities to users looking for more compact memory processing at the cost of higher compute times.

From my (admittedly succinct) time researching this subject, I found the zfp project which is maintained/supported by LLNL and essentially provides compressed containers to data. Now, they mainly focus on lossy compression, but they have support for lossless compression as well. This leads me to believe that there are potential gains to this approach given that this project exists and researchers have devoted what seems like a considerable amount of time to making it work.

However, you seem to have a very strong opinion that this project is not worth pursuing. Indeed, there is a big question mark on whether or not the processing of compressed arrays would make everything so slow as to be completely useless. I would be very interested in any references, studies or data that you can provide me with (online or offline) that I can take to the parties with vested interest in this project as arguments that this may be a bad idea. Please do let me know if you can share those with me.

I’ll also keep in mind, if we do move forward with this project, that it would be interesting to do some benchmarking of the compressed arrays vs. LOD or out-of-core approaches to see how they stack up and which paradigm might be more useful in which visualization scenario.

pieper · August 22, 2022, 3:32pm

Also lossy I believe, but nvidia’s new neuralvdb also claims to optimize certain computations on big datasets.

lassoan · August 23, 2022, 8:57pm

I’m not against adding support for compression, it just does not make me excited. My expertise is mainly in medical image computing. In this field the largest data sets are 3D images. Size of these images can be typically reduced by 30-60% using lossless compression, which rarely makes any practical difference. On the other hand, loading and saving a larger (few GB) compressed image may take 1-2 minutes, instead of 5-15 seconds, which is a very significant difference that users immediately notice.

Some data sets are much better compressible, for example binary labelmaps storing large, smooth blobs can be run-length encoded quickly and efficiently (10x-100x compression is achievable). Based on our experience with ultrasound and video data, with lossy methods we can reach up to 10-100x compression with barely visible differences. However, in both cases, it completely depends on the data and user requirements if significant compression is feasible.

Even if we can compress specific data sets for some applications quickly and efficiently, out-of-core processing (streaming) and LOD rendering (multi-resolution) may still be needed, because: 1. Most likely even the compressed data will not fit comfortably in the memory. 2. It would be a lot of work to make the graphics hardware, low-level visualization pipeline, and all relevant filters work directly on compressed data, while using them on smaller/lower-resolution data often does not require any extra effort. Therefore, for now, for our applications, I would be much more interested in hearing about (and maybe contributing to) improvements in VTK’s streaming and LOD rendering infrastructure than adding infrastructure for compression.

estan · August 24, 2022, 7:03am

Just to chime in, we deal with tomography of rock samples. We use lossless Blosc LZ4 compression at level 4 in order to save disk space, but we’ve so far not had a need to do in memory compression.

On the other hand, loading and saving a larger (few GB) compressed image may take 1-2 minutes, instead of 5-15 seconds, which is a very significant difference that users immediately notice.

I’m surprised at these numbers. In our case we’ve picked the compression parameters so that compression time is acceptable (~200 MB/s off the top of my head, and I think thats roughly the write performance of the SSD in our scanner) and decompression is so fast that it does not become a noticable overhead on the client computer, even on fast local SSD storage. Are these numbers for high level gzip compression maybe?

Then again, our use case is read only for the tomography on the client computer, so maybe different from yours.

lassoan · August 24, 2022, 7:19am

Exactly. This happens when we use zlib-compressed NRRD image file format. A cardiac CT can be 6GB (512^3 volume, 16 bits/voxel, 20-25 frames), so even if compression speed is 50-100MB/sec, decoding time adds 1-2 minutes to file reading.

If we need to access only a single frame then compression slows things down even more dramatically. Zlib can only decode from the beginning, so to extract a slice from the middle of a volume we need to decode half of the data set, so instead of accessing a frame in 1 second, we need 1 minute. If we split the data into multiple blocks and/or use a more modern codec then access speed can be improved but not all file formats support these.

ben.boeckel · August 24, 2022, 3:38pm

There is an abandoned submission (I resubmitted, but it’s been stalled) to use zfp in the XML I/O classes (which support various compression formats).

https://gitlab.kitware.com/vtk/vtk/-/merge_requests/2783

estan · August 24, 2022, 4:16pm

@lassoan Ah I see. We store the tomography in a chunked HDF5 data set, so random access is sufficiently fast since only needed chunks have to be read+decompressed, at least for our use cases. But it’s true that slice by slice accesses would incur a significant overhead even for us.

lassoan · August 25, 2022, 10:35am

Using more modern codecs in VTK XML files could work well, because there are no significant compatibility concerns (since the XML-based file formats are only used via the readers/writers implemented in VTK). I guess interest in the pull request could be drummed up by doing some profiling to show the speed and/or compression ratio improvement compared to existing codecs.