Proposal: adding a `vtkPointCloud` data structure

Yohann_Bearzi · July 29, 2020, 7:26pm

As of today, VTK does not have a specialized class to store point clouds. One way to store a point cloud is to create a vtkPolyData using only vtkVertex cells. It works sometimes: some filters taking a vtkPolyData as input can work with point cloud. However it is not always the case.

This issue needs to be able to differentiate poly data and point clouds to do volume rendering. This could potentially be handled by scanning the whole input data set, and looking if any rank has something else than a vtkVertex cell, in which case regular rendering can be performed, and otherwise a point cloud specific volume rendering can be performed. I think that this would be very sloppy as, for instance, a user could still add a vtkTetra cell in the point cloud, and totally change the display. The prior step scanning the whole input to check if the data set is a point cloud or something else is also tedious.

This is why I think we would greatly benefit from having a specialized point cloud class, which we could call vtkPointCloud, being a concrete subclass of vtkPointSet. Besides the fact that this would help fixing the issue I mentioned, it would be a good starting point to having a clean point cloud algorithm library, as point clouds often need specific algorithms because of their lack of structure. Point clouds also tend to be a lot denser than meshes, which would be a good occasion to think about bringing in more efficient point locators in VTK.

Since points and cells are the same concept in point clouds, we can debate on whether we set that there is no cell data and only point data, or if we can find a way for both cell and point data to return the same pointer.

Please share your opinion if you have concerns, thoughts, suggestions about this.
Thanks!

lassoan · July 29, 2020, 7:44pm

A single vtkPolyData class managing multiple very similar data structures (point cloud, triangle mesh, triangle strip mesh, quad mesh, polylines, and mix of any of these) enormously simplifies implementation of VTK filters and VTK-based applications. The inconveniences that you listed are valid, but they can be resolved much more easily than introducing an entirely new data set type.

Yohann_Bearzi · July 29, 2020, 10:24pm

So how do we tell if a poly data is a point cloud? Should we add meta data to vtkPolyData that filters could check if needed, that appropriate readers / filters should set?

cory.quammen · July 29, 2020, 10:27pm

vtkPolyData has some nice API for this use case because it stores different cell types in different cell arrays. So a simple check for whether a vtkPolyData has only vertices is to see if vtkPolyData::GetVerts() is not null or empty while checking that vtkPolyData::GetPolys(), vtkPolyData::GetStrips(), and so on are null or empty should do the trick.

However, I don’t think you should even need to concern yourself with cells that are present or not in a vtkPolyData. The volume mapper should just operate on the points in the dataset and ignore cells. You could even write it to accept a vtkPointSet, then users could have a number of choices of data sets to represent a point cloud.

cory.quammen · July 29, 2020, 10:29pm

I think that should be left up to the application to track, much like we leave units for image spacing up to the application to define.

utkarshayachit · July 29, 2020, 10:44pm

I suppose the key point to consider here is whether point-cloud is a distinct type or is it simply a polydata with vertices. I’d argue it’s a separate data-type on its own.

When representing point-cloud as polydata, one has still have to create a cell-array for marking the verts. This can be quite expensive for large point clouds. Currently, a large slew of polydata filters and mappers simply do nothing (and rightly so) when no cell-array is specified.
Simply treating all points in a polydata as a point-cloud is incorrect as well. Since by definition, the polydata’s vtkPoint’s could indeed have points that forms verts or points that are part of other topological elements like triangles. We have no clear definition what points are part of the point-cloud: are points from the verts alone part of the point cloud, or all points in the vtkPoints form the point cloud, or all points from all topological elements form the point cloud. Note each of these will yield different results.
There is a whole class of filters and algorithms that can are possible for point-clouds. An entire library exists just for that! So it’s not too outlandish to consider if its a first-class data-type in VTK.

Note, the proposal is to make it a vtkPointSet. So all dataset or pointset based algorithms will still work seamlessly. Also, it’s a trivial convertor to convert any dataset, including polydata and unstructructured grid to a point-cloud. Of course, the filter can have controls to let user choose how exactly to pick the points that are part of the cloud to overcome the ambiguities with interpretting polydata (or unstructured grid) as point cloud that were mentioned earlier.

lassoan · July 29, 2020, 11:27pm

It seems that you are asking for removing point cloud support from polydata. This would break a lot of things in VTK and in applications (vtkPolyData is used in 1900 files in VTK), for very little gain. If you need a way to give a hints to filters to interpret polydata as point-cloud then there are much simpler options:

Make “vtkPolyData without any cells” mean “point cloud”. Filters that currently ignore points if they are not assigned to vertices could be updated accordingly.
Add a flag to filters that can operate on both point clouds and polygons to give hint about how to process point-cloud-like polydata.
Add a new vtkDataObject information key that would give a hint (to the very few filters that are interested in it) that the polydata is a point cloud. Information keys are propagated through the pipeline, so you could set them in special readers and filters and use them in all other filters and writers.

utkarshayachit · July 30, 2020, 10:42am

That is not correct. This proposal does not change vtkPolyData at all, just introduces a new dataset type for a specialized case: quite similar to how vtkImageData is a specialization of vtkRectilinearGrid which in turn is a specialization of vtkStructuredGrid.

olesenm · July 30, 2020, 11:49am

When representing point-cloud as polydata, one has still have to create a cell-array for marking the verts. This can be quite expensive for large point clouds. Currently, a large slew of polydata filters and mappers simply do nothing (and rightly so) when no cell-array is specified.

This would be indeed be convenient, or at least less clunky, for handling lagrangian data. Nice one.

Michael · July 30, 2020, 12:56pm

To some extent, having a dedicated data structure for each simplex set is useful. Point cloud (0D) of course but lines-only (1D), triangles-only (2D) and tetrahedron-only (3D) data structures can be very useful as well.

lassoan · July 30, 2020, 2:02pm

If vtkPointCloud is not a subclass of vtkPolyData then we cannot use vtkPolyData to properly represent point clouds anymore. This is a major issue, because we still want to support point clouds in our applications and we would need to consider using vtkPointCloud class in addition (or instead of) vtkPolyData everywhere where we currently simply use vtkPolyData. Such mode would also hurt point cloud processing people because suddenly none of VTK filters would work on their data sets (they would need to temporarily convert to polydata and convert back; or they would need to update all filters to support both vtkPolyData and vtkPointCloud). The cost would be huge for very little benefit.

The only reason I can think of why a vtkPointCloud class could make sense is for marketing purposes: to show to lidar processing people that VTK supports point clouds (I can see that some people feel uneasy about using a class that has “poly” in its name for storing just points). For this, you could derive vtkPointCloud from vtkPolyData. This would not immediately break anything. However, there is high risk that having a separate child class, incompatibilities would start to creep in. For example, some new filters that are originally developed for point clouds, might only accept vtkPointCloud class and not vtkPolyData, not because it would not work with polydata, just because developers simply did not think or care about vtkPolyData compatibility when implementing it.

Yohann_Bearzi · July 30, 2020, 3:39pm

I don’t understand how it is desirable that triangular meshes and point clouds be represented in the same data structure. There is a whole literature for point cloud specific algorithms that are not doable with a mesh, and vice versa. To start a tiny list of point cloud specific algorithms: RANSAC, RIMLS, APSS, Manifold Harmonics (exists for both meshes and point clouds, but the computation is totally different).

If a poly data can be a point cloud as well as a triangular mesh, how do we grey out unapplicable algorithms? Do we just allow them an do not produce anything when there is no vtkVertex cells?

lassoan · July 30, 2020, 4:13pm

It is an application implementation detail. There are certainly many options that are as good as checking the class of an object.

The obvious choice would be to use all points of the vtkPolyData if there are no vtkVertex cells.

Algorithm developers tend to prefer fine-grained class hierarchy (data structures and filters with narrow scope), while it makes applications simpler if there are less types (and they have broader scope). VTK so far hit a good balance and it would be nice to preserve this.

Michael · July 31, 2020, 9:34am

Sure, I understand, but specific simplex classes have very useful advantages. For instance:

Point clouds do not need explicit cells
Triangle-only mesh is a direct OpenGL mapping without cells preprocessing (more computationally efficient and -25% reduced memory print on cell array)

We already have the PolyData which is a specialization of the Unstructured Grid, and it is widely used.
Having a new specialization for specific usage makes sense to me.

utkarshayachit · July 31, 2020, 11:04am

Two very good points:

“VTK so far hit a good balance and it would be nice to preserve this.” - @lassoan
“Having a new specialization for specific usage makes sense to me.” - @Michael

From the discussion, we’re agreed that point-clouds are different. Let say we go with ‘make “vtkPolyData without any cells” mean “point cloud"’. Then too it requires most (if not all) filters to do “something else” if they determine input is a point cloud e.g. iterate over points instead of cells. Several other filters are not applicable at all e.g. contour, resample with point-cloud as the data array input, etc. If most poly-data algorithms end up having to a have a special case when dealing with point-clouds, I’d argue that’s a clear sign that it’s a specialization worth having.

A polydata can be easy represented in unstructrued-grid; similarly a point-cloud can be easily represented as polydata or unstructured grid. And conversion back and forth will be quite efficient (except for the cost of creating and storing the topology array).

will.schroeder · July 31, 2020, 1:41pm

A couple of comments:

I would encourage folks to look at the ~40 classes in Filters/Points that are point cloud filters (and there are more point cloud filters in other subdirectories too). They work just fine with any vtkPointSet as input. Generally the assumption in these is to ignore any topological information (verts, or cells of any type). However, you could easily use topological information (like verts, lines, etc) to carry additional information about the point cloud (which points are on/off, local neighborhood information, topological feature like edges, etc.) This could be very advantageous for hybrid point cloud algorithms.

Note that as new dataset types are added, these is a lot of work writing conversion filters to go between types. Also, there tends to be the need to add complexity to some filters to recognize new types. IMO having many types is vaguely reminiscent of having lots of template parameters - there is a lot of code spent dispatching and switching between types, and coding can feel like being in a straitjacket

Finally a side note about point locators. The classes in Filters/Points use the vtkStaticPointLocator which, based on benchmarks I did a while back, are much faster than anything else in VTK - nothing comes close. However, the weakness of this locator (which I rarely see) is when there is huge spatial variation in spacing between points (e.g., transition to boundary layer). Because the performance of vtkStaticPointLocator is weakly connected to the binning resolution, generally the response is to crank up the resolution to very high values - which works to a point. If somebody wants to improve the point locator, I would focus on creating a hierarchical, static point locator. It would be cool to have, but would take some work.

Yohann_Bearzi · July 31, 2020, 4:08pm

We just had a small debate about this issue with other Kitware folks, and a potential solution that could satisfy everyone came up. What if vtkPointSet was not an abstract class anymore? It would make the class hierarchy almost untouched. You could then instantiate point sets without having to make it an unstructured grid or polydata and only have access to the wanted elements: points and their data. We would have to explicitly implement pure virtual methods related to cells, making them returning nullptr or zero when appropriate.

What do you you guys think about that?

lassoan · July 31, 2020, 6:02pm

This could work. Does it mean that you would not add a new vtkPointCloud class, but just use vtkPointSet objects for storing point cloud?

Yohann_Bearzi · July 31, 2020, 6:07pm

This could work. Does it mean that you would not add a new vtkPointCloud class, but just use vtkPointSet objects for storing point cloud?

Yes, vtkPointSet would be the point cloud data structure.

lassoan · July 31, 2020, 6:08pm

OK, this should be good then!