vtkWindowedSincPolyDataFilter performance improvement

This topic is a follow-up of modernization/performance improvement of filters - focusing on vtkWindowedSincPolyDataFilter optimization.

Here are sample VTK polydata data sets that we typically work with. They are created from medical images by thresholding or more sophisticated segmentation methods.

Download link: https://1drv.ms/u/s!Arm_AFxB9yqHu5QrlI-CW9Wi0B1_Rw?e=oUrPxm

  • Large files: Original segmentations (Bones.vtk, Skin.vtk), and from smooth labelmap (BonesSmoothed.vtk, SkinSmoothed.vtk). Flying edges filter can generate these polydata from images in 1-2 seconds. It would be great if smoothing would not add too much time to this.
  • Brainatlas: Contains small meshes in a variety of sizes. May be used to test that there is no significant overhead when segmenting many small polydata.

Great data, thank you. I will slowly start working through this filter in the time I can muster. I will likely ask you to try out a branch periodically. My intuition suggests that one of the biggest issues is the construction of the connectivity which ~ consists of one vtkIdList per point. The time to allocate and deallocate is significant - this needs to be rewritten. Next there are lots of loops that can be threaded, and of course the data access via GetPoint() is slow. Finally, I noted that the output points type is always float, is there any reason not to support the standard VTK ability to support user specification of float or double output, or a default behavior that whatever point type comes in, the same type is created for output?

Andras please provide me with a simple script, or otherwise provide me with the values you typically use for NumberOfIterations, PassBand value, NormalizeCoordinates flag, FeatureEdgeSmoothing flag, FeatureAngle, EdgeAngle, and the other data members etc. I want to make sure to replicate your typical workflow.

For example on my 40-thread desktop with the python script below I am seeing ~4.5secs for the Bones.vtk data (which I agree seems slow to me). Are these values consistent with your workflow?

smooth = vtk.vtkWindowedSincPolyDataFilter()
smooth.SetInputConnection(reader.GetOutputPort())
smooth.SetNumberOfIterations(20)
smooth.BoundarySmoothingOn()
smooth.SetFeatureAngle(120)
smooth.SetEdgeAngle(90)
smooth.SetPassBand(0.1)

1 Like

Hi Will!

Great to have your digging into this :+1:

We use this filter to update the 3D view of Segmentations in Slicer. Users really benefit from real-time updates when they make edits, so speeding this up will have many practical benefits.

This code is probably the hotspot for us:

1 Like

Steve I am curious, I notice that you are using NonManifoldSmoothingOn() (while the boundary and feature edge smoothing is off). Looking into the code, I see that we can realize a significant performance gain if nonmanifold, boundary, and feature edge smoothing are all off. If the input to this filter is coming from flying edges / marching cubes (which use manifold-producing case tables), then there should be no non-manifold meshes output. I am going to add the optimization in and later we can decide if non-manifold smoothing is actually needed. Basically, it means that the BuildLinks() invocation can be skipped, which despite being threaded, still takes a good chunk of time.

BTW as initially written, the vtkWindowedSincPolyDataFilter is inherently serial in the way it modifies data in a co-dependent manner (e.g., when performing topological analysis and building the smoothing edges network). I’m having to completely rewrite these steps for the purposes of threading etc. There are also some suspect decisions made when smoothing polylines - this should be rethought (although since few people use the filter for line smoothing it may not matter).

Hi Will - very good point - in this use case we should not have non-manifold meshes. Not sure why the flag is on. We definitely won’t have polylines in this pipeline.

Just a quick status update. I got the full filter framework running: threading, templates, array dispatch, new data structure / algorithm design (to eliminate memory fragmentation and enable threading). Also optimization paths for different types of smoothing (simple, boundary, feature, nonmanifold).

(I’m probably going to be sorry that I am saying the following but here goes…) I am seeing ~20x speedup on my machine at this point (Bones.vtk, my desktop 40 thread machine, highest level of algorithm optimization - no boundary, feature, nonmanifold edges). I expect as we clean this up the speeds will likely drop, but I am now seeing from about ~4.5secs to 0.225secs. Of course, all it takes is one bug to change everything so don’t count on this (yet) but I am hopeful :slight_smile:

This is an interesting algorithm to work on because threading, algorithm/data structure redesign, templating / dispatch - each make a big difference. I wish I could say what’s most important but it’s hard to separate out the effects one from another. For example, while threading is important, because I had to use an array of std::atomic, it’s likely that thread-scalability suffers. And there was a tremendous use of GetPoint/SetPoint during iteration so dispatch really cut out a lot of data access overhead. The algorithm itself was decidedly serial and used excessive amounts of fragmented memory…

Anyway I need to come up for air for a little while as next week is very demanding. I’ll keep poking at the algorithm when I can and push a MR in a week or two for you to start testing. I also want to do a little benchmarking etc. moving forwards.

3 Likes

Before I leave this topic I hope it’s clear that there are so many performance opportunities in VTK - if you are into this stuff it’s like a kid in a candy store :-). When we originally wrote VTK in our spare time as part of a book it was really an exercise in prototyping ideas and code. Since that time there have been steady improvements in implementation, data model, algorithm design, performance, software process and so on. I encourage folks to learn the new stuff (like dispatch) and clean up some old code - it’s rewarding and fun!

2 Likes

This is huge progress, thank you very much.

Wow, this is amazing!

It would incredibly useful if we could make people do this. However, it is much harder to use these complex programming concepts than to do simple serial programming, so it would be important to have good training materials. Blogs, reference manuals, etc. are all somewhat useful but it would be important to add 1-2 new VTK textbook chapters that describe these.

Update: 3D Slicer has switched to VTK9 now and we had a chance to test the optimized filter extensively. The result is quite shocking: computation is indeed about 20-25 faster on various geometries, tested on different computers. Surface smoothing was the main reason for slow 3D display updates of segmentations, and now the computation time is negligible.

@will.schroeder If you think such kind of performance increase is achievable for other classes as well then more developers should be trained in the techniques you applied.

2 Likes

Thanks for the update, it’s good to hear that it is working well for Slicer.

It would be fun to write something like a “practical guide to threaded programming in VTK” etc. Especially if we do it with humor, and using a pragmatic take on various programming approaches.

1 Like

Dear @will.schroeder,

It was mentioned during the 35th NA-MIC Project Week that you are the creator of the “Grow From Seeds” functionality within 3D Slicer.

I just wanted to say thank you.

This is an amazing segmentation tool that I use almost every day now for the segmentation of lung cancer and COVID-19 datasets.

Great work and thanks again !

Kind regards

Prof. Rudolf Bumm, MD
Department of Surgery
Kantonsspital GraubĂĽnden
Chur
Switzerland

I’m glad it’s working out for you. Thanks for the feedback,

Just to give a bit of more context: vtkWindowedSincPolyDataFilter is used in region growing and other interactive segmentation tools in Slicer for generating smoothed surface meshes for 3D visualization. Quick updates make a huge difference here, because the user needs to wait for the update after each paint stroke or other adjustment.

1 Like

BTW if there are any filters (or other operations) that are too slow, it’d be good to hear about them. No promises, but we keep a running list and try to address the more impactful filters.

2 Likes