You need to run a C++ profiler if you want to know where the performance bottleneck is. It is not enough to know that it is somewhere in render().
If you work with 2D+t images then you may need to build VTK with TBB SMP backend or set maximum number of threads globally to 1 for better performance. If you replay sequence of large 3D volumes then most likely transfer volumes from CPU to GPU takes the most time, and you can make replay a magnitude faster by creating actors for all time points and just show/hide them (show actor of current time point and hide all others).
You may try 4D dynamic image rendering in 3D Slicer. It supports not just images but recording and replay of time sequence of all data objects (meshes, transforms, point/line/curve/plane/etc. markups, segmentations, tables, …) in multiple slice and 3D views. For example we use it for volume rendering of 4D cardiac sequences in virtual reality. On laptop without discrete GPU we typically get 20-30fps for display of a high-resolution slice or a small volume in 4 views, with update times between 0.03-0.05 seconds (see example below). We don’t see that huge variation of 0.001 to 0.05s that you do, and I suspect that the 0.001 rendering time may actually mean that the rendering was skipped for some reason. Basic profiling shows that most of the time is spent in the OpenGL calls in the graphics driver and we believe it is due to the image transfer from CPU to GPU. On desktop with discrete GPU, we can usually reach the requested display frame rate of 30fps, even if we display a moderate-size 4D sequence with volume rendering.
What kind of dynamic sequences do you need to render? Cardiac CT, cine-MRI? What is the typical volume size and number of time points? What frame rate are you trying to achieve? Do you need to display tools, implants, vector fields, etc.? Do you use need to use volume rendering or just slice display? Do you need synchronized replay in multiple views?
