Threading: Parallel filters with OpenMP

rootofpi · August 21, 2021, 6:00pm

I have written a program with VTK that imports STL files and voxelises them. Sometimes a large number of files may have to be imported, voxelised and merged to obtain a single computational grid. In order to speed things up I thought about parallelising it with OpenMP. For this purpose I have written a function that merges two three-dimensional voxel domains

vtkSmartPointer<vtkImageData> Importer::mergeImageData(vtkSmartPointer<vtkImageData> const& image_data_1, vtkSmartPointer<vtkImageData> const& image_data_2) noexcept {
  vtkSmartPointer<vtkImageMathematics> image_mathematics {vtkSmartPointer<vtkImageMathematics>::New()};
    image_mathematics->SetOperationToMax();
    image_mathematics->SetInput1Data(image_data_1);
    image_mathematics->SetInput2Data(image_data_2);
    image_mathematics->Update();
    vtkSmartPointer<vtkImageData> merged_image_data {vtkSmartPointer<vtkImageData>::New()};
    merged_image_data->DeepCopy(image_mathematics->GetOutput());
    return merged_image_data;
  }

The different geometry files are imported, voxelised and merged then as follows:

#pragma omp declare reduction(MergeImageData: vtkSmartPointer<vtkImageData>: omp_out = Importer::mergeImageData(omp_out, omp_in))
#pragma omp parallel for default(none) firstprivate(filenames, resolution_x, reduction_rate, bounding_box) reduction(MergeImageData:merged_image_data)
for (auto const& f: filenames) {
  // Import geometry and voxelise it (runs some filters)
  vtkSmartPointer<vtkImageData> single_image_data = Importer::importFromFile(f, resolution_x, reduction_rate, bounding_box);

  // Merge the domain to the existing one
  merged_image_data = Importer::mergeImageData(merged_image_data, single_image_data);
}

To my surprise this did not work and I got several errors in the process such as

2021-08-21 19:21:47.644 (   2.277s) [        43DF9700]vtkDemandDrivenPipeline:666    ERR| vtkCompositeDataPipeline (0x7f6bec02eb50): Input port 0 of algorithm vtkImageMathematics(0x7f6bec02ec40) has 0 connections but is not optional.

or

2021-08-21 19:22:35.514 (  50.147s) [        4A53F680]       vtkImageData.cxx:1517   ERR| vtkImageData (0x7f6b8c027020): GetScalarPointer: Pixel (0, 0, 0) not in memory.
 Current extent= (0, -1, 0, -1, 0, -1)

I went then on to read about parallelisation and thread-safety with VTK to find out that VTK is not thread-safe by design and therefore my code runs into issues. Is it possible to get this to work without turning to multi-processing?

will.schroeder · August 21, 2021, 7:02pm

Take a look at vthSMPTools or vtk-m. vtkSMPTools for example supports an OpenMP backend.

Like many systems, there are large swaths of thread safe code in VTK and of course non-thread safe code. The trick is to know which is which. I wish there was an easy answer to this, but looking at threaded filters in VTK is a good start.

rootofpi · August 21, 2021, 7:41pm

Thanks a lot for your answer Mr. Schröder and thanks for your work for Kitware!

I just had a quick glance at vtk-m and it seems to me that it would require me to restructure my code significantly. I am not sure about vtkSMPTools. From what I understood what one would do is to compiler VTK with VTK_SMP_IMPLEMENTATION_TYPE set to OpenMP. Then one could have some speed-up with filters that derive from vtkThreadedImageAlgorithm but it should be quite modest and it will likely not be able to use the 40 threads of my machine.

What you said about thread-safe code in VTK could be more helpful. Are the image algorithms that inherit from vtkThreadedImageAlgorithm thread-safe? What about the different readers e.g. vtkSTLReader as well as the methods vtkCleanPolyData, vtkDecimatePro and vtkPolyDataToImageStencil? Do I have to look into their source code to find out or is there an easier way to find out?

rootofpi · August 21, 2021, 10:51pm

Just found the reason for why it was not working. It actually had nothing to do with threading in VTK directly it seems but instead with the reduction operation and the vtkSmartPointer (just like std::shared_ptr). Removing the reduction operation and marking the mergeImageData step as #pragma omp critical actually solved it!

#pragma omp parallel for default(none) shared(merged_image_data) firstprivate(filenames, resolution_x, reduction_rate, bounding_box)
for (auto const& f: filenames) {
  vtkSmartPointer<vtkImageData> single_image_data = Importer::importFromFile(f, resolution_x, reduction_rate, bounding_box);

  #pragma omp critical
  merged_image_data = Importer::mergeImageData(merged_image_data, single_image_data);
}

This way I was able to speed-up the corresponding code by a factor of roughly 5 on my system.