How to accelerate the Render()?

The pipelien of my project is:


...
windowLevel = vtk.vtkImageMapToWindowLevelColors()
imageActor = vtk.vtkImageActor()
imageActor.GetMapper().SetInputConnection(windowLevel.GetOutputPort())
ren = vtk.vtkRenderer()
ren.AddActor(imageActor)
renWin = vtk.vtkRenderWindow()
...

for time = 1:100
    windowLevel.SetInputData(img)
    windowLevel.Update()
    renWin.Render()

However, I find the speed of above code is slow. How can I speed it up?

I tried that replace vtkRenderer with vtkOpenGLRenderer, but the speed has no change.

How can I speed it up? Does GPU work?

In addition, I find the cost time of renWin.Render() is not constant. Sometime, it cost 0.05s; sometime, it cost 0.001s.

GPU is used for rendering, but many parts of the pipeline run on the CPU (for example, vtkImageMapWindowLevelColor). You can profile your application to determine what operations are time-consuming.

Most likely the issue is that you call render each time you receive a mouse/keyboard event. If you implement a VTK-based application with an event loop it is important to use a mechanism that decouples rendering requests (e.g., in response to a GUI event) from actual render calls. See implementation in CTK (ctkVTKRenderView.h). Instead of wasting time with redeveloping such mechanism (and many other features that you need for an end-user application), you can run your Python scripts in existing visualization application frameworks, such as 3D Slicer (for medical applications) or ParaView (for generic engineering visualization).

There is no need to use vtkOpenGLRenderer - vtkRenderer instantiates the appropriate rendering class (e.g., vtkOpenGLRenderer).

Thank you for your kindly reply. I have calculate the time by:

for time = 1:100
    t0 = time.time()
    windowLevel.SetInputData(img)
    windowLevel.Update()
    t1 = time.time()
    renWin.Render()
    t2 = time.time()
    print('cost time: ', t1-t0, t2-t1)

Thus, I am sure that ````renWin.Render()``` cost most of the time.

Moreover, I do not call renWin.Render() when I receive a mouse/keyboard. Actually, I will frequently call Render() because it is a 4D dynamic image. The Render() will sometime cost 0.05s, and sometime cost 0.001s. I don’t know why the cost time is not a constant value.

Is there any method to speed up Render()?

You need to run a C++ profiler if you want to know where the performance bottleneck is. It is not enough to know that it is somewhere in render().

If you work with 2D+t images then you may need to build VTK with TBB SMP backend or set maximum number of threads globally to 1 for better performance. If you replay sequence of large 3D volumes then most likely transfer volumes from CPU to GPU takes the most time, and you can make replay a magnitude faster by creating actors for all time points and just show/hide them (show actor of current time point and hide all others).

You may try 4D dynamic image rendering in 3D Slicer. It supports not just images but recording and replay of time sequence of all data objects (meshes, transforms, point/line/curve/plane/etc. markups, segmentations, tables, …) in multiple slice and 3D views. For example we use it for volume rendering of 4D cardiac sequences in virtual reality. On laptop without discrete GPU we typically get 20-30fps for display of a high-resolution slice or a small volume in 4 views, with update times between 0.03-0.05 seconds (see example below). We don’t see that huge variation of 0.001 to 0.05s that you do, and I suspect that the 0.001 rendering time may actually mean that the rendering was skipped for some reason. Basic profiling shows that most of the time is spent in the OpenGL calls in the graphics driver and we believe it is due to the image transfer from CPU to GPU. On desktop with discrete GPU, we can usually reach the requested display frame rate of 30fps, even if we display a moderate-size 4D sequence with volume rendering.

What kind of dynamic sequences do you need to render? Cardiac CT, cine-MRI? What is the typical volume size and number of time points? What frame rate are you trying to achieve? Do you need to display tools, implants, vector fields, etc.? Do you use need to use volume rendering or just slice display? Do you need synchronized replay in multiple views?

It is a 4D flow MRI image. The image size is about 512*512*19*28 (W*H*slice*time). Thus, I think creating actors for all image (19*28=532) may also cost much time, and I will try it.

In addition, the update time of 0.03~0.05 is for one view or for four views?

Rendering of this volume should be fine. 512x512x19 volume is small/medium size. You just have 28 time points, which is also not too many. If you want to go beyond a few ten fps then you can add one actor per time point - 28 in total, which should not be a problem at all.

20-30fps in Slicer is typical for synchronized update of all views, with just one shared actor for all time points.

What frame rate are you trying to achieve? Do you need to display time-varying tools, implants, vector fields, etc. or just the image? Do you need to use volume rendering or just slice display? Do you need synchronized replay in multiple views?

I will try adding 28 actors for 28 time points.

20~30 fps is OK for me. But, currently, it need about 0.03~0.05s to update one view, and I also need to synchronized update four views in my project, which need totally about 0.12~0.15s. Thus, the time frame is about 6~8 for me.

Have you used any trick to update four views? Do you only update four views like the following code?

    ...
    windowLevel1.SetInputData(img)
    windowLevel1.Update()
    renWin1.Render()
    ...
    windowLevel2.SetInputData(img)
    windowLevel2.Update()
    renWin2.Render()
    ...
    windowLevel3.SetInputData(img)
    windowLevel3.Update()
    renWin3.Render()
    ...
    windowLevel4.SetInputData(img)
    windowLevel4.Update()
    renWin4.Render()
    ...

In Slicer, low-level, performance-critical tasks like view updates are all implemented in C++. To make sure we keep the application responsive despite continuous updates, an update request mechanism is used: whenever data is changed, a render request is submitted for all affected views, and rendering is performed when the application becomes idle or a timer elapses. To synchronize the views, for each time step update, rendering is blocked in all views, data is updated, and rendering is re-enabled.

I would not recommend to try and implement a multi-view 4D data visualization application in pure Python. Python’s global interpreter lock, limited multithreading, etc. would make this really hard. Instead, if you work with medical images, using 3D Slicer as a basis of your application would be the obvious choice. It takes care of all basic features (DICOM import/export, 4D image visualization, quantification, temporal analysis, etc.) and you can write your own Python scripting to customize the user interface for your workflow (hide all GUI elements that you don’t need, add widgets that you need) and implement your custom processing/analysis tools.

1 Like