New feature: vtkDataSet::GetPointsDataArray ?

It already has occurred to me a few times that, while developing a filter for VTK, I would need to iterate on points in a generic vtkDataSet. This can currently be done by calling void vtkDataSet::GetPoint(vtkIdType, double[3]), which copies into the passed buffer the point coordinates. If the data structure is a vtkPointSet or inherits from it, those point coordinates are taken from an explicit list of points. If it is something else (vtkImageData for instance), points being likely implicit, the point associated to the point id is computed on the fly.

Since a few months, AcceleratorsVTKm, which was a heavy module to compile, is split into 3 modules: Core, DataModel, and Filters. Core holds vtkmDataArray, DataModel holds vtkmDataSet, and Filters holds filters (which is the part that is expensive to compile).

What I’m interested in, in this module, is vtkmDataArray, which is a data structure inheriting from vtkDataArray that allows to define implicit arrays. Data set types such as vtkImageData or vtkRectilinearGrid could benefit from that and store their implicit point arrays through a vtkmDataArray, at near zero cost memory-wise.

This would allow us to expose a read-only data array storing points in vtkDataSet, which could have the following signature:

vtkDataArray* GetPointsDataArray() const

The nice thing about having access to such a buffer (explicit or implicit), is that, from that point, algorithms needing to iterate over points could use the nice range interface and alleviate a lot of tiny 3-vector copies that would happen within the vtkDataSet::GetPoint method.

There would be a need to overload this method in vtkImageData, vtkRectilinearGrid, and vtkPointSet.

Please feel free to give an opinion on that proposal.

2 Likes

VTKm requires C++14, right? VTK does not yet require C++14 yet, so that’s a consideration.

Assuming that this change were made, what other kinds of benefits might we get by depending on VTKm’s DataModel? The one you mentioned is fine, but from a cost/benefit point of view, it doesn’t excite me that much.

VTKm requires C++14, right?

AcceleratorsVTKm is a VTK module, living inside the VTK repo. So I don’t know if we need to shift towards C++14 for that. But I don’t know enough about that to know if it is a problem.

Assuming that this change were made, what other kinds of benefits might we get by depending on VTKm’s DataModel?

With this change, Common/DataModel would depend on AcceleratorsVTKmCore, not AcceleratorsVTKmDataModel.

This Core module should be at some point a dependance of Common/DataModel because at some point vtkHyperTreeGrid is going to use it to index its nodes.

This is the legacy way of accessing data. You should now use vtkSMPTools::For and other modern constructs. See for example how @ken-martin has revamped vtkWindowedSincPolyDataFilter and achieved 20-25x speedup.

My understanding is that vtkm is intended for massively parallel systems, such as GPUs, supercomputers - not for the 4-8 cores that you have in an average CPU.

@Yohann_Bearzi you keep bringing up very interesting problems and may feel that they are not always greeted with enthusiasm. It is not because these would not be excellent ideas, just that there are many other things to improve in VTK which has a much better cost/benefit ratio and makes much bigger impact.

For example, what do you think about focusing your efforts on bringing in more essential features into VTK, such as:

  • make available this awesome geodesic path search method available as a VTK remote module
  • make available vtkbool Boolean mesh operations as a VTK remote module
  • make it easy to distribute VTK remote modules on PyPI
  • make more VTK classes compatible with arbitrarily oriented images
  • look for problems and enhancement ideas on the VTK issue tracker and work on those that seem to be pressing issues (it could be also useful to discuss the idea here before investing time into implementing it)

This is the legacy way of accessing data. You should now use vtkSMPTools::For and other modern constructs.

In the instance where I would have needed to use GetPoint, it was inside a vtkSMPTools::For instance. The reason I need to use GetPoint is because the algorithm can be applied on any vtkDataSet, and vtkDataSet only has this interface to access points.

I could write 2 different functors: one for vtkPointSet, and one for other data sets (because there will be a copy anyway with the implicit arrays, since the explicit buffer doesn’t exist). But implicit arrays through vtkmDataArray could alleviate having to write 2 versions of an algorithm when you need to iterate overs points on any vtkDataSet.

My understanding is that vtkm is intended for massively parallel systems

This AcceleratorsVTKmCore module doesn’t have, up to my knowledge, any need to be run on a massively parallel system. An implicit vtkmDataArray is just a vtkDataArray that you can template on a functor, which is used to compute the output of the array at a given index on the fly. It is AcceleratorsVTKmFilters which has some heavily parallel filters implemented into it.

This is an interesting idea with one concern: we’ve done a lot of work over the past several years to reduce dependencies across modules etc. to make it easier to embed parts of VTK into applications. I really don’t like the idea of adding this sort of dependency into the mix. I agree having such a method vtkDataArray* GetPointsDataArray() const might be useful, but adding such a dependency is not worth it IMO. I’m wondering if this could be done without a vtkm dependency.

All excellent suggestions Andras. To accelerate these efforts, I recommend that you 1) recruit some VTK community members to help, 2) help us hire more people, and/or 3) help find funding to pay for the work :slight_smile: Right now we are (fortunately) slammed - there’s nothing like a paying customer to help set priorities :slight_smile:

1 Like

I agree having such a method vtkDataArray* GetPointsDataArray() const might be useful, but adding such a dependency is not worth it IMO. I’m wondering if this could be done without a vtkm dependency.

This could be done if we reimplemented vtkm implicit arrays inside of VTK, but this would be time consuming… I proposed to use vtkm’s implementation to avoid to reinvent the wheel, since it is already there.

The reason AcceleratorsVTKm was split into three modules was to make it less involving to integrate those arrays inside Common/DataModel.

Would that be that much of a problem having VTK depend on a small subset of vtkm? I feel like reimplementing it inside VTK wouldn’t reduce any complexity, since it would likely be a very similar implementation as the one in vtkm. It would just be a duplicated code living in 2 different repos, which doesn’t seem optimal IMO.

We had a similar debate a few months ago about making Eigen a dependency for vtkCell (in Common/DataModel). For some context, I needed to multiply rectangular matrices to some vectors. It didn’t happen (no Eigen dependency was introduced), but making a vanilla VTK handle this kind of matrix operation was involving (I implemented matrix multiplication methods that are templated on the dimensions and the layout (transposed, diagonal) of the input matrices in vtkMath).

The question is, when some feature makes programming easier and / or enhances some VTK structures, do we refuse to import the relevant parts from third parties, that we know work well, or do we either not implement the so said feature, or either do a home brewed implementation, taking into account the extra cost of doing so?

When I presented the matrix thing I wrote, the first question I received was, “Why didn’t you use Eigen”? - “Because this thing was in Common/DataModel”. And I was kind of told that in the future, Eigen would be the way to go for this instance, which I agree with.

I have reservations about this approach:

  1. Are we at a point where vtk-m can become a required dependency of VTK? I think people from vtk-m team should really answer that question since they can illuminate how mature is it.
  2. If it’s only a matter of providing a fast-iterable thread-safe API for points in a vtkDataSet, we can provide that independently. As @will.schroeder said, adding a VTK-m dependency for simply that reason sounds like an overkill. Note, I am not taking of creating a new implicit/explicit storage stuff of data array. Simply a fast point-iterator that gets specialized based on which type of vtkDataSet subclass you’re dealing with may be fine.

Preface: Note that this is with my “software process” hat on and should be seen as a goal; I expect realities to soften some of these things.

I’ve been working to get VTK to follow some stability guidelines. Is VTK-m at a point where it can commit to VTK’s required compatibility guarantees? Things I’d like to see from VTK-m first:

  • a deprecation policy for APIs and that newly-deprecated APIs be kept for a least a release, they continue being tested, etc.
  • lower build machine requirements; requiring 32GB of RAM to comfortably compile is…not great. I did a 4G per-TU ceiling run a while ago, but I think we’d probably need a 1G ceiling run (for non-CUDA builds)
  • release cycles
  • maintained release branches

Things VTK would need to do:

  • support an external VTK-m (distros, HomeBrew, Spack, vcpkg, etc. are really going to want this if we do it since pinning VTK-m to a hash for all consumers too[1])
  • VTK uses only what a given VTK-m release guarantees (so a VTK release (and branch) can only use what is available in a given VTK-m release “group” (read: what distros will generally do patch updates for))
  • Finer-grained VTK::vtkvtkm targets. Right now, using it means vtkm_cont and vtkm_filter; we’d probably want that to be a bit finer-grained to avoid over-linking

[1]vtkm is unmangled symbol-wise, so we can’t mix-and-match VTK-m-as-shipped-by-VTK and other VTK-m usages.

I know of users that are very happy to be able to disable VTK-m in their paraview build.
I’d say no.

1 Like

I don’t think that mixing VTK and VTK-m’s data models in this way is the best approach. For one, VTK-m arrays can be on the GPU side and using them in VTK may cause them to be copied back to CPU memory. Also, VTK algorithms are not designed to use the VTK-m array API and exposing a different array interface would cause a rift between various algorithm implementations. If we want this (and it is a good idea), we should modify VTK array API to handle implicit arrays. There is already the vtkGenericDataArray infrastructure that would support this.

In the future, the use of VTK-m inside VTK algorithms is likely to grow - mainly because we will want GPU support. When that reaches a critical point, many algorithms can switch wholesale to using the VTK-m API for datasets and arrays.