set/get FieldData with numpy_interface.dataset_adapter

Hello, I am doing experiments in Python, trying to set and get FieldData on e.g. a PolyData, UnstructuredGrid, etc., with numpy_interface.dataset_adapter. I am particularity interested in string FieldData, used to store various attributed of a vtk object, but also in numeric arrays.

However if I try to use the same syntax as for Points, I always get errors.

import vtk
from vtk.numpy_interface import dataset_adapter as dsa
import numpy as np

my_polydata = vtk.vtkPolyData()
# add some points and cells to my_polydata here..

my_polydata_dsa = dsa.WrapDataObject(my_polydata)

# set string example
my_polydata_dsa.FieldData["key_1"] = "string_1"

>>> TypeError: 'DataSetAttributes' object does not support item assignment

# numeric example
my_polydata_dsa.FieldData["key_2"] = [1.0, 2.0, 3.0]

>>> TypeError: 'DataSetAttributes' object does not support item assignment

# get examples (do not work due to previous errors)
print(my_polydata_dsa.FieldData["key_1"])
print(my_polydata_dsa.FieldData["key_2"])

I also tried with .FieldData.keys() , .FieldData.values() , .FieldData.append() , but I cannot understand the right syntax.

Thanks very much!

Update: GetFieldData() works with numerical arrays.

my_polydata_dsa.GetFieldData().append(np.asarray([1.0, 2.0, 3.0]), "array_1")
my_polydata_dsa.GetFieldData().GetArray(0)
>>> VTKArray([1., 2., 3.])

However this does not work for strings.

my_polydata_dsa.GetFieldData().append(np.array(["my string"], dtype="object"), "string_1")
>>> TypeError: Could not find a suitable VTK type for object

VTK’s dataset_adapter was developed 5 years ago and has not been touched since then. At that time there was no commonly accepted way of storing labeled multi-dimensional array with arbitrary metadata.

We could now have a proper adapter that creates an interface between VTK data object to xarray and store much more metadata (including arbitrary text fields).

ITK has greatly improved their xarray support recently. For example, we now have a standard way of storing essential metadata, such as image origin, spacing, axis direction for an image in a “numpy” array.

It would be great if VTK’s Python wrapper could be improved to have a similar interface as ITK.

@bistek Please have a look at the current VTK numpy array converter and try to create a new version of it that creates an xarray instead of a simple numpy array, similarly to how it is done in ITK. It is pure Python code and all complicated parts are already done, so the task is mostly about adding some more data types (such as string arrays) and following xarray conventions.

@dgobbi Do you agree that this could be a good approach for making VTK data objects more conveniently available in Python?

@lassoan, thanks for prompt answer!

First I don’t know if my limited Python may lead to some result on the xarray integration in a reasonable time. If others would like to contribute, we can think about it!

Then, I try to explain my goals…

I’m developing a Python 3D geological modelling application, heavily based on VTK, that allows visualising, editing, digitising, interpolating, etc. different types of object, mainly polyline, point-cloud, and triangulated surface polydata, but also unstructured and regular grids are used (e.g. for digital elevation models), and images (geophysical data), tetrahedral meshes (for output and as a support for interpolation), etc.

All these objects are collected in a vtkCollection. This, as far as I understand, leads to better performances than storing them in a simple Python list. BTW, is this correct?

All the objects in the collection should have metadata, such as a unique_id (used trace the object even if the collection is changed), name, type, and other properties related to the geological and/or topological meaning of each object. For each kind of object I use a class derived form the relevant VTK class (e.g. PolyData for triangulated surfaces).

To maximise re-using VTK functions, all the objects in the collection are saved to and read from disk as standard VTK files. This is one reason to store the metadata as FieldData, so that they will be read/saved automatically.

Then, some method to search the collection for particular metadata, select objects according to metadata values, etc., would be very useful. At the moment I am doing this with Pandas dataframes where I store all metadata and a string pointing to each vtk object, so I see that using xarray might be logical.

Quite obviously, being able to access the underlying VTK arrays a Numpy arrays is fundamental in my application, e.g. to facilitate using interpolation or geostatistics libraries etc.

So… do you still think that xarray would be a reasonable answer?

The other option would be to simply use VTK FieldData to store the metadata. This is their original goal, if I understand what you write here. However the documentation on FieldData is very limited and I do not know if I will able to re-implement the basic methods in my classes (e.g. add a field, remove a field, set and get values in the fields, either strings or numerical, etc.). Any help or Python example on this would be very useful!

Thanks!

I just got an answer on GitLab (thanks!):

https://gitlab.kitware.com/vtk/vtk/-/issues/18042

I copy it here:

The string array (vtkStringArray) in VTK is not memory compatible with numpy string arrays so the numpy interface does not support it. You have to use vtkStringArray directly to add string arrays:

array = vtk.vtkStringArray()
array.SetName("foo")
array.SetNumberOfTuples(2)
arraySetValue(0, "test")
array.SetValue(1, "test2")
my_polydata_dsa.GetFieldData().AddArray(array)

After this, it is possible to retrieve the array with:

my_polydata_dsa.GetFieldData().keys()
>>> ['foo']

my_polydata_dsa.GetFieldData().values()
>>> [(vtkmodules.vtkCommonCore.vtkStringArray)00000257193E1048]

my_polydata_dsa.GetFieldData().values()[0].GetValue(1)
>>> 'text_2'
1 Like

Using the same memory layout is only needed if you want to map the array to numpy objects without copying data. This is a really nice feature but for most cases it is fine to make a copy of your data set and conversion is trivial if you are allowed to make a copy.

If you want to avoid copying then it is a harder problem and would require assistance from someone who knows the details of how VTK’s Python wrapping works. You can store strings in numpy arrays as individual objects, which I believe should be compatible with the memory layout of vtkStringArray:

country = np.array(['USA', 'Japan', 'UK', '', 'India', 'China'], dtype = 'object')

xarray has many more options for assigning metadata to an array.

@dgobbi What do you think? How hard would it be to wrap vtkStringArray as numpy array or xarray? Have you considered adding xarray interface, so that for example oriented vtkImageData could be easily exchanged between ITK and VTK in Python?

@lassoan the memory layout of a vtkStringArray is an array of std::string, so it’s not memory-compatible with numpy.

I’m so far behind on my main VTK tasks (bringing vtk-dicom into master and finishing up vtk-python autocompletion) that I’m not even going to consider any other suggestions for quite a while, unless it means mentoring others who are willing to do the work.

1 Like

Hello and thanks for considering this issue! Regarding the possibility of developing myself a xarray-style interface, I am afraid it is a bit too much both in terms my of coding skills and time availability, It might be that some of my PhD students might have time in the future, in this case we will com back.

Thanks very much!