Best practice for rapidly changing vtkImageData?

lassoan · July 13, 2020, 1:35pm

In 3D Slicer, we do a lot of 2D+t and 3D+t image sequence visualization in slice views and volume rendering, using the approach that @dgobbi described above (load the entire sequence in host memory, set up the visualization pipeline, and just update the image voxels using memcpy).

If you choose this technique then make sure you use TBB, because overhead of creating threads for extracting a single slice using image reslice filter is enormous. The improvement is particularly dramatic if you use a CPU with many cores and discrete NVidia GPU (probably because creation of dozens of processing threads per second confuses Nvidia’s threaded optimization heuristics).

On a desktop PC, for a 256^3 volume, we can reslice and display dozens of slices at 30fps (our view refresh rate). If we add rendering of the volume in one view, then rendering drops to 26fps, if we render in a second view as well then we can update all views at about 23fps.

We chose to use this approach due to its simplicity and flexibility. However, we did some feasibility tests and confirmed that we can get 100+ fps volume rendering by uploading the entire 4D volume to the GPU and use one of these methods:

“filmstrip” technique: use a single actor, concatenate all 3D volumes into one large 3D volume along a chosen axis, set up clipping planes to show only a single volume, and switch between time points by changing the origin of the actor
multi-actor technique: add an actor for each 3D volume and switch between time points by changing visibility of actors (always show only one actor at a time)

When we showed these technique to clinicians, rendering was so fast that they asked us to please slow it down. This was the first time ever I heard clinicians complaining about volume rendering being too fast.

Yes, extraction of a slice is done on the CPU and only the necessary slice is sent over to the GPU. This may be faster than transferring the entire volume to the GPU at each time point. However, if you need to do volume rendering then you need to transfer the volume anyway and so then it would be faster to reslice in the GPU. We created a set of VTK classes that allows running part of the display pipeline on the GPU, but there was no much interest from the VTK community, so we did not invested too much into this idea further.

Profiling of this is hard, even in plain C++ environment. We see a lot of time spent in various threads of the graphics driver and in system calls. There is no obvious bottleneck in VTK.