Loading 3D volume performance

tomasbkk · June 14, 2024, 7:24am

I’m very new to VTK and 3D programming overall and have been tasked to create a 3D volume viewer which receives volumes consecutively and displays them as they’re received.

I have used https://examples.vtk.org/site/Cxx/Medical/MedicalDemo4/ example to create a basic app and modified it so that it can load volumes consecutively from a set of .mha files as a test.

I create the pipeline only once at the start of the app and then have a function which only loads the volume:

void VTKWindow::loadFileFast(const QString& filePath)
{
    m_loadedFile = filePath;
    m_loadedFile.replace("file:///", "");
    qDebug() << "----loadFileFast " << m_loadedFile;

    m_reader->SetFileName(m_loadedFile.toStdString().c_str());

    m_window->Render();
}

Eventually this app will be fed volume data at around 2-4 volumes per second and I need to display this volume at a similar frame rate but in my tests the loading and rendering of a volume takes around 800msec before it’s displayed on the screen so I’m getting a rate only at around 1.2 volumes/second.

The volume .mha files are very small (250kb each) and currently only hold a single 2d ultrasound frame in 3d space.

I would like to know if it’s possible to improve the framerate of displaying volumes in this way?
Is the slow down due to using the .mha format?
Could the volume data somehow be fed into VTK directly (not using .mha file) and would that improve the performance?
Or would it be better to load the volume only once and then somehow update its data to improve the performance?

Any help to point me in the right direction would be much appreciated.

Aron_Helser · June 14, 2024, 2:37pm

In general file IO is orders of magnitude slower than in-memory operations, so if you are receiving the volume data in the same program that is displaying it, removing the file IO will help a lot.

You could test potential volume performance in ParaView - I’m not quite sure how you can do it, but I think if you create a Wavelet source of the same resolution as the volumes you are interested in, and then look at the Animation options - if you change one of the options on the Wavelet using animation, you should be able to get ParaView to have to regenerate the source each frame, which simulates what you are trying to do - and you can see how fast ParaView can do it, and even test larger resolutions easily, if that is of interest.

HTH, Aron

dgobbi · June 14, 2024, 3:21pm

I’ll add that if you want to do anything in real-time that involves files, then asynchronous IO and buffering are absolute necessities. VTK’s IO is not asynchronous, but you can make it asynchronous by running the reader in a separate thread (or even a separate process) and using a buffer. That way, the IO and the rendering will run concurrently, instead of constantly having to wait for each other. It’s a complicated thing to do, so I’m not going to provide any recipies here. But you can look at PlusLib which has already solved many of these issues for ultrasound. See their example here for reading US data from .mha files.

tomasbkk · June 14, 2024, 3:26pm

This is very useful, I will investigate further.
Thank you guys.

lassoan · June 15, 2024, 3:10am

What images are you working with? The file system is not well suited for continuously streaming images in real-time. Typically a network interface or some custom SDK is used.

If you just need to replay recorded images then I would recommend to keep all images in memory. If you keep the images in CPU memory and use a single GPU volume raycast mapper (and keep replacing the input image of the mapper) then you can usually achieve 10-20 fps update rate. If you create a volume raycast mapper + actor for each image (and you just keep changing the actors visibility) then you can achieve 100+ fps update rate on a discrete GPU.

tomasbkk · June 17, 2024, 11:22am

@lassoan To test the performance I was provided with .mha files from the developer who’s working on creating the 3d data, but we have the freedom to stream the data directly in memory to vtk and not need to save to disk first.
The images are being created in real-time, so it’s not possible to load all files to memory in advance but both the image creation part and the display part will eventually run in the same application so the images can be kept in CPU memory.

I used vtkOpenGLGPUVolumeRayCastMapper and I plan to use vtkImageData to load the data instead of using vtkMetaImageReader. is this the way to do it? How can I make sure the images are in CPU memory?

lassoan · June 20, 2024, 7:11pm

For larger volumes, transferring the voxel data from CPU memory to GPU memory may significantly slow down the rendering, but you depending on the data size and your hardware you can reach few ten fps.

You may consider using the OpenIGTLink standard, which is developed for fast streaming of imaging, tool positions, meshes for image-guided therapy. It is well supported in the VTK-based 3D Slicer application. So first you don’t need to develop anything for image visualization, just implement a very simple imaging component that dumps imaging data in OpenIGTLink messages and use 3D Slicer to visualize the data.

See for example how 3D Slicer displays 4D (3D+t) ultrasound streamed live from a Philips ultrasound system:

If you reconstruct 3D volume from live tracked 2D frames then you don’t have to send full 3D reconstructed volumes but you can just stream the 2D frames and 3D Slicer can do the live volume reconstruction (optionally with GPU acceleration):

estan · June 21, 2024, 12:03pm

Sorry for hijacking, but I’m curious about this since I had not heard about OpenGTLink.

In our case, we are doing asynchronous loading of volumes from disk. The volumes represent sections of drill core (mining industry). They are loaded in response to the user interactively panning up/down the drill core in our UI, so that we only keep the sections currently “in view” rendered, so as to not exhaust system and/or GPU memory.

But even though we do the file loading and as much as we can of other preparatory tasks in separate threads, it has always been a problem that the final bit, the Render call when data is uploaded from CPU to GPU, has to be done in the main thread, thus blocking UI rendering for a few hundred ms.

Could OpenGTLink help here? Does it allow uploading data to GPU in a fully nonblocking fashion ?

(EDIT: To clarify, this is a problem since the stuttering introduced when sections of the tomography are unloaded/loaded from/to the GPU while the user is interactively panning up/down the drill core degrades the UX)

lassoan · June 23, 2024, 4:39am

OpenIGTLink is a communication protocol, used over plain TCP or UDP sockets. It does not address the issue of texture transfer from CPU to GPU. What we implemented in the above video is pasting arbitrarily oriented slices into a 3D volume using OpenCL.

Uploading texture to GPU on a separate thread should be no problem, but it may require some VTK changes. We did not have to implement it for any of our projects, so I don’t have first-hand experience with it.