Slow Rendering of large 2D Image on MacOS, but not Linux or Windows

Hello,

We have an existing application that uses VTK 9.5.2 for our 3D visualization. We have noticed that when we load large 2D images (23,500 x 23,500 pixels) that the interaction rendering speed on MacOS is about 3~4 frames per second. @sankhesh provided a simple pure VTK example code (which I’ll throw at the bottom) that essentially either loads the image or just creates a large 2D image. If we compile that code on Linux (Ubuntu 22.04) or Windows 11 against Vtk 9.5.2 then the rendering speed is 1000 FPS. Here are the machine specs:

MacOS Sonoma 14.8.2, XCode 16.2, 64GB Ram 16 core M3 Max CPU.

Windows: AMD 9950, AMD Radeon 9060XT, 128GB Ram

Linux: AMD 9950, NVidia 3060 (12GB), 128GB RAM

I was thinking maybe it just is the MacOS machine but then I loaded up ParaView 6.0.0 and loaded the same data set and rendered it using the “Slice” view. It renders at what I would assume is 1000 FPS. So I know it isn’t the MacOS machine specifically but how we are initializing or setting up the Vtk Code.

Has anyone run into this before? Would anyone have any ideas? If there are any ParaView engineers lurking about, could you point us to how ParaView sets up the Slice view so we can try to duplicate that setup?

We compiled against Vtk 9.5.0, 9.5.1, 9.5.2, 9.6.0.rc3

The data set is located here: Vkt_Demo_Data - Google Drive

Code Example

#include <vtkImageActor.h>
#include <vtkImageData.h>
#include <vtkImageProperty.h>
#include <vtkImageSliceMapper.h>
#include <vtkInteractorStyleTrackballCamera.h>
#include <vtkLookupTable.h>
#include <vtkNew.h>
#include <vtkRenderWindow.h>
#include <vtkRenderWindowInteractor.h>
#include <vtkRenderer.h>
#include <vtkSmartPointer.h>
#include <vtkTIFFReader.h>
#include <vtkActor.h>
#include <vtkCallbackCommand.h>
#include <vtkNamedColors.h>
#include <vtkNew.h>
#include <vtkPolyDataMapper.h>
#include <vtkProperty.h>
#include <vtkRenderWindow.h>
#include <vtkRenderWindowInteractor.h>
#include <vtkRenderer.h>
#include <vtkSphereSource.h>


#include <iostream>
#include <string>

namespace {
void CallbackFunction(vtkObject* caller, long unsigned int eventId,
                      void* clientData, void* callData);
}

int main(int argc, char* argv[])
{
  vtkSmartPointer<vtkImageData> imageData;

  if (argc > 1)
  {
    vtkNew<vtkTIFFReader> reader;
    reader->SetFileName(argv[1]);
    reader->Update();
    imageData = reader->GetOutput();
  }
  else
  {
    const int width = 23500;
    const int height = 23500;

    imageData = vtkSmartPointer<vtkImageData>::New();
    imageData->SetDimensions(width, height, 1);
    imageData->AllocateScalars(VTK_UNSIGNED_CHAR, 1);

    for (int y = 0; y < height; ++y)
    {
      for (int x = 0; x < width; ++x)
      {
        unsigned char* pixel = static_cast<unsigned char*>(imageData->GetScalarPointer(x, y, 0));
        // Create a checkerboard pattern
        if (((x / 100) % 2) == ((y / 100) % 2))
        {
          pixel[0] = 255;
        }
        else
        {
          pixel[0] = 0;
        }
      }
    }
  }

  double scRange[2];
  imageData->GetScalarRange(scRange);

  // Create a lookup table to map scalar values to colors
  vtkNew<vtkLookupTable> lookupTable;
  lookupTable->SetNumberOfTableValues(256);
  lookupTable->SetRange(scRange);
  lookupTable->Build();
  // Map 0 to black, 255 to red, others to grayscale
  for (int i = 0; i < 256; ++i)
  {
    if (i == 0)
      lookupTable->SetTableValue(i, 0.0, 0.0, 0.0, 1.0); // black
    else if (i == 255)
      lookupTable->SetTableValue(i, 1.0, 1.0, 1.0, 1.0); // white
    else
      lookupTable->SetTableValue(i, i / 255.0, i / 255.0, i / 255.0, 1.0); // grayscale
  }

  vtkNew<vtkImageSliceMapper> imageMapper;
  imageMapper->SetInputData(imageData);

  vtkNew<vtkImageActor> actor;
  actor->GetProperty()->SetLookupTable(lookupTable);
  actor->SetMapper(imageMapper);

  vtkNew<vtkRenderer> renderer;
  renderer->AddActor(actor);

  vtkNew<vtkRenderWindow> renderWindow;
  renderWindow->AddRenderer(renderer);

  vtkNew<vtkRenderWindowInteractor> renderWindowInteractor;
  renderWindowInteractor->SetRenderWindow(renderWindow);

  vtkNew<vtkCallbackCommand> callback;
  callback->SetCallback(CallbackFunction);
  renderer->AddObserver(vtkCommand::EndEvent, callback);

  vtkNew<vtkInteractorStyleTrackballCamera> style;
  renderWindowInteractor->SetInteractorStyle(style);

  renderWindow->Render();
  renderWindowInteractor->Start();

  return EXIT_SUCCESS;
}

namespace {

void CallbackFunction(vtkObject* caller, long unsigned int vtkNotUsed(eventId),
                      void* vtkNotUsed(clientData), void* vtkNotUsed(callData))
{
  vtkRenderer* renderer = static_cast<vtkRenderer*>(caller);

  double timeInSeconds = renderer->GetLastRenderTimeInSeconds();
  double fps = 1.0 / timeInSeconds;
  std::cout << "FPS: " << fps << std::endl;

  //std::cout << "Callback" << std::endl;
}
} // namespace

Required Large Data Patch

You will also need to have this patch applied to your VTK sources in order to load an image this large. I think this is patched into VTK Master at this point.

diff --git a/Rendering/Core/vtkImageMapper3D.cxx b/Rendering/Core/vtkImageMapper3D.cxx
index 99471c7531..39d43364f9 100644
--- a/Rendering/Core/vtkImageMapper3D.cxx
+++ b/Rendering/Core/vtkImageMapper3D.cxx
@@ -836,7 +836,8 @@ unsigned char* vtkImageMapper3D::MakeTextureData(vtkImageProperty* property, vtk
   // could not directly use input data, so allocate a new array
   reuseData = false;

-  unsigned char* outPtr = new unsigned char[ysize * xsize * bytesPerPixel];
+  unsigned char* outPtr =
+    new unsigned char[static_cast<size_t>(ysize) * static_cast<size_t>(xsize) * bytesPerPixel];

   // output increments
   vtkIdType outIncY = bytesPerPixel * (xsize - imageSize[0]);

If you read this far, thank you. Any help, pointers or gentle nudges in a direction are appreciated.


Mike Jackson

I tested the sample code on my MacBook Air M4 and saw the same result: around 4 FPS. There is an easy way to bump it up to around 60 FPS: replace vtkImageSliceMapper with vtkImageResliceMapper (requires linking to the vtkRenderingImage module).

When I wrote vtkImageResliceMapper around 15 yrs ago, one of my goals was to be able to efficiently render huge images (in my case, microscopy) regardless of available graphics hardware. So this mapper does the heavy lifting with the CPU rather than the GPU. As a result, it can render gigapixel-size images even on a GPU with under 1GB. It’s not super-fast, but it’s far better than 4 FPS.

Thank you David. Tried locally here and it definitely fixed the issue. But now I have even more questions:

What does ParaView use? When I interact with our image ParaView uses about “75%” cpu (using top) where-as the example code using vtkImageResliceMapper uses 350% cpu. This on an M3 Pro Max MacBook Pro. I am trying to use macOS Instruments to figure out the code paths between the 2 examples.

Is this a know bug against vtkImageSliceMapper or should I submit a bug?

For now this gets us past the current hurdle.

Thanks again for the help

I did a brief look at the ParaView code vs. the VTK code over the weekend.

ParaView has its own image slice mapper (can’t remember the exact name of the class) that directly uses vtkOpenGLTexture. The behavior of vtkOpenGLTexture when the image is larger than the maximum size supported by the OpenGL drivers/hardware is to downsample the image so that it fits in a texture.

VTK’s vtkOpenGLImageSliceMapper is different, in that if the image is too large for a texture, the image is subdivided into quadrants (or further) and the texture is re-used multiple times to render each quadrant. As a result, the image is rendered without downsampling, but rendering is slow and seams might be visible.

So it seems that you could emulate ParaView by using vtkTexture instead of vtkImageSliceMapper, but this means loss of quality in the rendering (due to the downsampling) on hardware that can’t support textures as large as your image.

Note that it’s not macOS itself that’s the issue here. When I tried the example on a linux desktop with integrated graphics, I was only getting 1 FPS, i.e. significantly worse even than what you were seeing on the Mac.

There are ways that vtkOpenGLImageSliceMapper could be improved. For oversize images, instead of reusing the same vtkTexture for each quadrant, it could maintain one texture per quadrant and this would eliminate a lot of texture reloads on re-renders. There would still be seams visible in some renderings, however.

Edit re: the high CPU usage of vtkImageResliceMapper, this can be reduced somewhat via the option mapper->SeparateWindowLevelOperationOff() which speeds up zoom/pan interaction at the expense of slowing down window/level (i.e. colormapping) operations.

I would just note that images of this size are normally not sent to the renderer directly. Instead, you build a multi-resolution pyramid and use the pyramid level that is appropriate for the rendered pixel size. Large images are also often subdivided into tiles (not just for more efficient rendering but for storage/retrieve as well). You may be able to leverage the OME-NGFF file format and related tools for all these.

Are there any tools inside of VTK for image pyramids as of VTK 9.5 or upcoming in 9.6?

Off list Sankhesh stated that because the vtkImageResliceMapper does not use the GPU then hardware picking will not work. Unfortunately this is a feature that our users make use of quite often. I will have to work out how to move forward. We already have a custom OpenGL subclass for rendering these large images but it probably needs to have some updates.

Downsampling is an option. I would like to have something in the UI that states when the user is viewing a downsampled image, but we can work on that ourselves.

Thank you for digging into this and letting us know the various options.