vtkCleanPolyData much slower with Doubles than Floats

mtzeth · July 26, 2024, 9:10am

I was stuck trying to figure out why my code was running 5-6 times slower between my python project and my cpp project, and it turns out that it was because I was creating my vtkPolyData with a double data array instead of a float data array. In the below minimum example using vtkCleanPolyData, I get 35 secs for double points and 7 secs for float points. Is that a reasonable difference? For context, with vtkStaticCleanPolyData the difference was 1.01 secs vs 0.90. Also, is there a way to profile this in Visual Studio? When I run the performance profiler, it goes as far a the vtkStreamingDemandDrivenPipeline, where it counts time spent doing this->UpdateData(), but it doesn’t show the profile for the vtkCleanPolyData::RequestData function itself.

#include <vtkStaticCleanPolyData.h>
#include <vtkCleanPolyData.h>
#include <vtkSphereSource.h>
#include <vtkPoints.h>
#include <chrono>
#include <iostream>

int main() {
  auto c = vtkSmartPointer<vtkCleanPolyData>::New();
  auto sc = vtkSmartPointer<vtkStaticCleanPolyData>::New();
  auto sph = vtkSmartPointer<vtkSphereSource>::New();
  sph->SetThetaResolution(2000);
  sph->SetPhiResolution(2000);
  sph->Update();
  auto const sphere = sph->GetOutput();

  // Create a new vtkPoints with desired precision (float in this case)
  vtkSmartPointer<vtkPoints> newPoints = vtkSmartPointer<vtkPoints>::New();
  newPoints->SetDataTypeToFloat();  // SetDataTypeToDouble()

  // Copy the old points to the new points
  for (vtkIdType i = 0; i < sphere->GetPoints()->GetNumberOfPoints(); ++i) {
    double p[3];
    sphere->GetPoints()->GetPoint(i, p);
    newPoints->InsertNextPoint(p);
  }

  // Set the new points to the vtkPolyData
  sphere->SetPoints(newPoints);

  c->SetInputData(sphere);
  sc->SetInputData(sphere);

  auto t0 = std::chrono::high_resolution_clock::now();
  c->Update();
  auto t1 = std::chrono::high_resolution_clock::now();
  sc->Update();
  auto t2 = std::chrono::high_resolution_clock::now();
  std::cout << "vtkCleanPolyData secs = " << std::chrono::duration<double>(t1 - t0).count()
            << std::endl;
  std::cout << "vtkStaticCleanPolyData secs = " << std::chrono::duration<double>(t2 - t1).count()
            << std::endl;
}

amaclean · July 27, 2024, 11:01pm

is threaded so that’s one reason why it is much faster.
Is your VTK build in Release or Debug mode? If it is in Debug mode then build VTK in Release mode and see if it is faster.

mtzeth · July 29, 2024, 7:27am

Cheers for the reply! I get that it’s threaded, I was just curious why there is such a big % difference between floats and doubles with the non-threaded vs threaded versions. The results were for VTK built in Release mode.

dgobbi · July 29, 2024, 10:02pm

VTK is an open-source project, so if you’re curious about why a class behaves in a certain way, you can just look at the code. See vtkMergePoints.cxx at line 104. There is an optimized code path for float points.

Edit: I’ve submitted a patch that should speed up vtkCleanPolyData for double precision.

mtzeth · July 30, 2024, 7:26am

Thank you for your help David! That clears it all up, I should have just looked deeper into the source code! In this case it was just a matter of seeing that vtkMergePoints has separate implementations for doubles and floats, but in other cases, is there a preferred way to profile VTK code?

dgobbi · July 30, 2024, 12:16pm

The only profiler that I’m familiar with is Xcode Instruments on macOS. But any profiler should work fine with VTK.