Pythonic vtkDataArray and subclasses design

Hi all,

I have been working on making various VTK classes pythonic lately. One of the biggies is the vtkDataArray family. There has been several iterations in this direction in the past. Initially, it started with adding a set of VTK < = > numpy methods (thanks Prabhu!). Then came VTKArray which mixes VTK array API with numpy arrays. I have never been satisfied with either. VTKArray is a numpy array but is not a vtkDataArray. So we need special pointdata, celldata classes that return and take VTKArrays. But anywhere else when a vtkDataArray is returned or vise versa, one has to convert between the two. Furthermore, there are now newer VTK array that are used more prominently in VTK: SOA arrays, vtkAffineArray, vtkConstantArray, vtkStructuredPointArray etc… Using these with numpy ufuncs and binary operations requires materializing them as numpy arrays defeating their purpose of conserving memory. So I have a new design. See here:

https://gitlab.kitware.com/vtk/vtk/-/merge_requests/12971

This design leverages the .override() mechanism that was introduced in VTK 9.4 and replaces the targeted vtkDataArray classes with their python subclasses. These subclasses provide the numpy interface but are not numpy arrays themselves. They leverage numpy arrays when possible and roll out their own implementation when it is not. I worked at avoiding the materialization of numpy arrays as much as possible. The AOS array class is certainly longer than VTKArray but it feels much more natural to use. The SOA array, which we didn’t support with making it AOS before, works very naturally. I have implementations for some of the others and will introduce MRs for them. My aim is to remove the point/cell data overrides that return and take VTKArrays and have everything go to the new arrays. This should be mostly backwards compatible but may effect some user code that assume these are numpy arrays. Those will need to changed by calling to_numpy() on the vtkAOSArrays. Any code that users WrapDataObject() (like ParaView) will not be effected.

I would love to hear your thoughts.

converting vtkAOSArrays using to_numpy() feels more natural and i like it. It is similar to the pytorch torch.Tensor.numpy() method.

I have one question regarding the braket notation for templated array classes. Can it take numpy dtype objects? If not, can we expose the basic types to python and use those? Ex: vtk.float64, etc.

vtkAOSDataArrayTemplate[np.float64]() vs vtkAOSDataArrayTemplate['float64']()

Yes, bracket notation supports dtypes.
https://docs.vtk.org/en/latest/advanced/PythonWrappers.html#template-keys

1 Like

that is cool