vtkStaticCellLinks, vtkStaticPointLocator, vtkStaticCellLocator - speed improvements

will.schroeder · July 15, 2019, 12:16pm

Background: Over the past few years there have been substantial speed improvements due to the addition of systems like vtk-m and vtkSMPTools. Additionally, C++11 constructs like std::atomic provide facilities for multithreading etc. In particular, some of the core VTK data structures like cell and point locators, and the process of building topological links (e.g., cell links) have been undergoing renovation, producing speedups of multiple factors approaching an order of magnitude. Several years ago the new classes vtkStaticCellLocator, vtkStaticPointLocator, and vtkStaticCellLinks were added but used only in specialized circumstances. Due to their significant speed improvements (and in many cases significant reductions in memory usage), and the inexorable increases in data size, it’s time to mainstream these classes. For example, using vtkSMPTools and std::atomic, vtkStaticCellLinks is about 10x faster (to build) than the vtkCellLinks equivalent, and also much faster to destroy (asingle delete[] versus many).

Over the past week we have begun switching out vtkCellLinks with vtkStaticCellLinks in vtkUnstructuredGrid, and vtkPointLocator with vtkStaticPointLocator (in vtkPointSet - superclass of vtkUnstructuredGrid). Once these efforts stabilize, this process will move into vtkPolyData. Of course feedback is welcome - generally there will be no noticeable changes except for indeterminate queries like “find the closest point” when more than one “closest points” are all exactly the same distance away from the query point.

Nasty Details: One of the challenges moving forward is dealing with aspects of the dataset API, in particular portions of the vtkUnstructuredGrid and vtkPolyData API enable dynamic “editing” of the dataset - for example, after the dataset is constructed and point locator and/or cell links are built, new points/cells can be dynamically added/deleted. This behavior is inconsistent with the “static” classes which are meant to be built once, at high speed. Currently the overwhelming use cases for VTK datasets are as static data, with a handful of filters (like Delaunay triangulation, etc) incrementally editing their output datasets. To deal with this, vtkUnstructuredGrid and vtkPolyData have a new data member “Editable” which by default is off, with static locators and cell links being used. However, if Editable is enabled, then the old-fashioned locators and cell links classes are used, and the iterative, editing API works as expected.

Apologetic Aside: Whenever you dig into the guts of a complex system like VTK with a more than 25 year history behind it, you are likely to find cobwebs and coding artifacts that no longer make sense. I noticed this when changing out cell links - in ancient times the links were represented as a list of cell ids using a given point, with a count of cell ids represented with an unsigned short. Along the way much changed including the introduction of vtkIdType and associated changes to the cell locators - however the usage of “unsigned short” remained for some strange reason (nobody wanted to mess with the API I suppose). While making the changes described here I did the “bull in the china shop” thing and changed the unsigned int to vtkIdType - I’m sure annoying several folks, but thankfully several VTK developers are helping clean up my mess - thank you!

Extra Credit Challenge: While I am very pleased with the speed increases, I am frustrated by one remaining method: vtkUnstructuredGrid::GetCellNeighbors(). It turns out this method is key to further improving the performance of topological filters like vtkGeometryFilter. My suspicion is that there is a clever way to speed this up but so far I’ve come up dry. Maybe someone out there can crack this nut

lassoan · July 15, 2019, 11:25pm

Most of what is described above sounds great and will help VTK moving forward. However, the “Editable” flag is pretty unclear.

Does Editable=false means that the polydata can never be changed - otherwise results may not be correct or the application may crash? Or “Editable” property is only a hint, so things would still work regardless of how the Editable flag is set, but not with optimal performance? If it is only a hint then maybe a property name such as UpdateHint would be more appropriate and making it an enum would make it more future-proof (maybe now we only have static and dynamic locator but we may want to add various incremental locators in the future).

To avoid frustration of developers, automatic mechanisms should be added to detect incorrect or potentially suboptimal API usage and trigger a warning log. For example, static locators could simply log a warning each time they had to rebuild themselves due to changed input. Developers could then choose to set a flag that suppresses the warning message or enable the “Editable” flag or UpdateHint.

will.schroeder · July 18, 2019, 10:01am

Editable means that portions of the API will work without crashing (as documented).

This approach is not ideal, but since changing backward compatibility is such a fraught topic we have to evolve slowly. If I had my druthers, I might redesign vtkPointSet and subclasses so none of them are editable, using either specialized subclasses for editing, or rewriting the few algorithms that need to in place edit these datasets. But then Bill would yell at me

I have had a couple casual conversations with a few developers about extending the use of locators, and even using different locator strategies for methods like FindCell(). This is an extremely complex topic and it’s not even clear at this point what the use cases are… we’ll probably have to walk into this slowly. To get a taste of the complexity of locator strategies, see vtkStreamTracer and its use of vtkAbstractInterpolatedVelocityField and vtkCellLocatorInterpolatedVelocityField. One of the major challenges is that locators are associated with a dataset and in some cases should flow through the pipeline with the dataset (to avoid duplication of what could be large data structures). This might require a object caching mechanism, and/or alterations to the pipeline update process.

lassoan · July 18, 2019, 2:36pm

I just see that this very important change has been already integrated into VTK two weeks ago, without asking for any comments from the community. I really appreciate the effort of improving VTK’s performance and understand that the problem is complex and not all inputs can be taken into consideration but not giving any opportunity to the VTK community to discuss this is very disappointing.

Do I interpret the documentation documentation correctly that now even geometry (point positions) of a vtkPointSet must not be ever modified? How can I create it then? How can I specify that I’ve initialized the object and will not change it anymore? Do I get at least a warning message if I modify a “non-editable” pointset? Why a Boolean flag is introduced (“editable” flag) to give hint for creating an optimal locator (why not at least an enum)? Is it expected that developers of VTK-based applications must review their entire code base if they want to upgrade to the next version of VTK to ensure there is no prohibited edits?

I’m not against of using VTK users as testers, but this change seems to go a bit too far. At least the new editable flag (or update hint enum) should be disabled by default (e.g., by introducing a VTK_DEFAULT_POINTSET_UPDATE_HINT CMake variable) until major concerns are addressed and all documentation and VTK examples are updated.

will.schroeder · July 18, 2019, 3:44pm

For all intents and purposes, nothing has changed - things will work like they did before except faster. The changes literally affected 3-4 filters out of the 100s of filters in VTK. The only time you need the editable flag set (it is off by default) is when you try and use the few methods that incrementally modify the cell links and/or point locator (which are documented). Because the change is relatively small IMO, I spoke with trusted developers and received a thumbs up. and was reviewed with a +1 - typical what we all do when making small changes The pace of development dictates that we can’t get feedback for every change and addition. If you have a concrete suggestion, I’d love to hear it.

lassoan · July 18, 2019, 3:57pm

Sorry, I still don’t understand how this is a small change. Could you please answer these questions so that I understand things better?

Do I interpret the documentation correctly that now even geometry (point positions) of a vtkPointSet must not be ever modified (since vtkPointSet editable attribute is set to false by default)?
How a vtkPointSet should be now created? How can I specify that I’ve initialized the object and will not change it anymore?
We often create and modify VTK data objects that are used in filter pipelines. Is this still allowed or we need to set “editable” flag for all such data objects?
Do I get at least a warning message if I modify a “non-editable” pointset?
Why just a Boolean flag is introduced (“editable” flag) to give hint for creating an optimal locator, why not at least an enum (static, static-topology, incremental, dynamic)?
Is it expected that developers of VTK-based applications must review their entire code base if they want to upgrade to the next version of VTK to ensure there are no prohibited edits?

will.schroeder · July 18, 2019, 4:16pm

Nothing changes from before - you are way overthinking this. Pretty much you don’t need to do anything. If you use one of the few methods that incrementally modifies a point locator or cell links, without setting Editable to true prior to building the locator, there will be an immediate crash (you’ll know that there is a problem).

Also please realize that the problems you are bringing up have existed for a long time. For example, modifying a point coordinate after the “old” point locator was built was a problem then and it shouldn’t be done now.

As discussed, the locator issue is complex. This is a major design to undertake. Right now we are trying to take baby steps, we’ll get to that at some point I hope.

lassoan · July 19, 2019, 1:36am

Could you please provide a concrete example of what was OK to do before that would crash or give incorrect results now?

For example, would this code still work correctly?

points = vtk.vtkPoints()
points.InsertNextPoint(10,10,10)
points.InsertNextPoint(20,20,20)
points.InsertNextPoint(30,30,30)

triangle = vtk.vtkTriangle()
triangle.GetPointIds().SetId(0, 0)
triangle.GetPointIds().SetId(1, 1)
triangle.GetPointIds().SetId(2, 2)

triangles = vtk.vtkCellArray()
triangles.InsertNextCell(triangle)

trianglePolyData = vtk.vtkPolyData()
trianglePolyData.SetPoints(points)
trianglePolyData.SetPolys(triangles)

print(trianglePolyData.FindPoint(50,50,50))  # result = 2

points.SetPoint(2,15,15,15)

print(trianglePolyData.FindPoint(50,50,50))  # result = 1

will.schroeder · July 23, 2019, 1:04pm

Sorry for the delay, quite busy.

I just ran the code it runs as expected. However there are issues with modified time (which is true whether you use the vtkPointLocator or vtkStaticPointLocator - it’s independent of the recent checkins).

The problem comes in when you invoke “SetPoint()”. This method, for performance reasons, does not invoke Modified() on its owning dataset. It is also designed this way so you can make many SetPoint() calls and then invoke one Modified() at the end.

What should happen is that: anytime you modify the points, and then do a FindPoint(), the locator needs to rebuild due to the vtkPoints being modified. You can do this by adding a points.Modified() after one or more SetPoint() invocations.

I’ll keep the code handy and we’ll use it as a test.

lassoan · July 23, 2019, 6:36pm

Thanks for checking this. You are right, I’ve forgot to call Modified() call after SetPoint (it’s just a coincidence that it worked without that). So, the correct code is this:

points = vtk.vtkPoints()
points.InsertNextPoint(10,10,10)
points.InsertNextPoint(20,20,20)
points.InsertNextPoint(30,30,30)

triangle = vtk.vtkTriangle()
triangle.GetPointIds().SetId(0, 0)
triangle.GetPointIds().SetId(1, 1)
triangle.GetPointIds().SetId(2, 2)

triangles = vtk.vtkCellArray()
triangles.InsertNextCell(triangle)

trianglePolyData = vtk.vtkPolyData()
trianglePolyData.SetPoints(points)
trianglePolyData.SetPolys(triangles)

print(trianglePolyData.FindPoint(50,50,50))  # result = 2

points.SetPoint(2,15,15,15)
points.Modified()

print(trianglePolyData.FindPoint(50,50,50))  # result = 1

Now I think the only problem with these changes in VTK is the extremely misleading property name. “Editable” would suggest that the data set is not editable, which is not true at all. This property is just a performance optimization hint and so it should have a name that reflects that, such as “UpdateHint” or “EditHint”.

It would be also more future-proof to make it an enum instead of a bool, as there are so many different kinds of editing than just random edits (e.g., incremental, delete-only, custom), and we already have different locators that are better suited for certain kind of editing operations. By restricting editing hint to be true/false, we cannot take advantage of already existing VTK performance optimization capabilities (especially because vtkPolyData, vtkPoints, and vtkUnstructuredGrid has no public API to set a custom locator, so we are really stuck with whatever is set internally).

will.schroeder · July 24, 2019, 3:00pm

These are good suggestions. I agree the name Editable is not necessarily the best, please suggest something. Although I would say these are more than hints, these are effectively configuration parameters which indicate how the expected dataset is to be used (“the data, once constructed, is static” or “The data once constructed may be futther modified.”

lassoan · July 24, 2019, 4:12pm

These property names would be more clear:

ModifyHint - since this property is strongly linked to changing the data object and calling Modified() the “modify” word seems fitting.
UpdateHint - since in VTK API making changes is very often referred to as “update”
EditHint - the current property is called “editable”, if that name is important for some reason then this name could be used (although “edit” word is barely used anywhere in VTK public API)

Configuration parameters for performance optimizers are quite often called “hints” - see for example OpenGL or SQL.

Or, are you considering changing the current behavior and actually prevent modification of the dataset if the flag is set to not-editable? It would just make all the occasional data set updates more complicated for users and would not simplify anything in VTK internally (you would still need to always check if the data was modified and rebuild locators anyway). For making performance hint violations more visible, a CMake option could be added that would report unexpected modifications as warnings.