More Pythonic VTK wrapping

berk.geveci · January 17, 2024, 10:23pm

Hi everyone.

I wanted to keep you appraised of new work we are doing to improve the VTK Python interface. The part I will discuss here is at the wrapper level and applies to all of VTK.

PS: The VTK wrapper is the piece of code that parses the VTK C++ code and generates C code that is compiled into Python modules to interface between Python and the C++ library. There is no hand-written Python code involved.

The first thing that we are doing is to add support for Python properties. This is based on work done by @dgobbi (big round of applause) and is being modified by @jaswantp (another big round of applause) to be included in the current wrappers. This is a work in progress that can be found here: https://gitlab.kitware.com/vtk/vtk/-/merge_requests/10820

This code identifies appropriate GetXXX() and SetXXX() methods and generates Python properties. For example, it will generate a property called point_data from the method GetPointData. The GetPointData method will also be generated as before.

Jaswant is also working on the ability to initialize properties in the constructor. Something like:

c = vtkContourFilter(contours=[10, 20, 30], generate_normals=True)

Going forward, we will also have to make changes to the public API of VTK classes to make them more property-friendly. For example, the example above may require adding a signature like this to vtkContourFilter:

const std::vector<double> GetContours();
void SetContours(const std::vector<double>& ctrs);

Hopefully, this is all straightforward and will get support from the community.

Now, some potentially controversial ideas.

I would like to overlead the Python rshift (>>) operator to enable making pipeline connections:

image = vtkImageData()
# fill image
pipeline = image >> vtkContourFilter(contours=[10,20]) >> vtkShrinkFilter()
pipeline.Update()
# ...

The operator will return the last filter in the pipeline (the shrink filter in this case).

I would also like to add an Execute method (execute in Python) that lets us do this:

image = vtkImageData()
# fill image
output_data = (image >> vtkContourFilter(contours=[10,20]) >> vtkShrinkFilter()).execute()
# ...

and this

image = vtkImageData()
# fill image
output_data = (vtkContourFilter(contours=[10,20]) >> vtkShrinkFilter()).execute(image)
# or
contour = vtkContourFilter(contours=[10,20]) .execute(image)
shrunk_contour = vtkShrinkFilter().execute(contour)

We are exploring and making up as we go so feedback would be very much appreciated.

I am working on improving the dataset_adapter module (which I will likely rename) to make it more pythonic and more tightly integrated with the mainstream Python interface. More on this later.

Sebastien_Jourdain · January 18, 2024, 1:16am

Is the pipeline object a new type? Meaning could we do things like that with it?

pipeline = (image >> vtkContourFilter(name="contour", ...) >> vtkClipFilter(name="clip", ...)

actor = vtkActor()
mapper = vtkDataSetMapper(input_port=pipeline.output_port(0), actor=actor)

renderer.add_actor(actor)
render_window.render() 

pipeline.contour.contours = [10, 20]
pipeline.clip.clip_type.origin = [1, 2, 3]

render_window.render() # execute new pipeline with new contours/origin

jaswantp · January 18, 2024, 1:40am

Is the pipeline object a new type? Meaning could we do things like that with it?

Nope. Atleast not in the current design. If I’m not wrong, pipeline would be the last filter in the chain. Few more things about your code.

vtkContourFilter(name=“contour”, …) >> vtkClipFilter(name=“clip”, …)

The kwargs in the constructor must be the properties of a python wrapped class. They have to be accessible in the C++ side through a SetValue(...) method. So passing name like you did will raise a TypeError.

This is not to say that it is impossible to define new properties from constructor. Right now, the constructor wrapping rejects keyword args that are not properties. This logic lets VTK catch typos and not accidentally create unintended properties, for example -
vtkSphereSource(centree=(1, 0, 0))
mapper = vtkDataSetMapper(input_port=pipeline.output_port(0), actor=actor)

output_port is a python property. It cannot be invoked like this right now. You can still write this instead. Just omitted the (0)
```
mapper = vtkDataSetMapper(input_port=pipeline.output_port, actor=actor)
```

ben.boeckel · January 18, 2024, 6:13am

Can a named method be added too?

Other than that, I feel like | is the more “standard” operator for this, but maybe that’s just a C++ perspective (cf. ranges pipelines). I don’t see a meaningful precedence difference unless & is intended to be used for multi-port connections (and would only change where the parentheses need to go when combining them).

Christos_Tsolakis · January 18, 2024, 10:57am

This is great ! It will reduce the size of scripts significantly. Also, we could take advantage of this in ParaView/Catalyst in order to produce neater scripts.

We had to do some on this for Async to simplify the wrapping code. I think it is a very good step towards simplifying the API and making it more modern.

I agree with that. Unix pipes use the same symbol. Let’s be consistent with them.

Overall I see a great improvement here!

One more idea. I believe it would be nice to have the ability to “store” and “re-use” a pipeline. Something like:

image1 = vtkImageData()
image2 = ... 

pipeline = vtkContourFilter(contours=[10,20]) >> vtkShrinkFilter()

out1 = (image1 >> pipeline).execute()
out2 = (image2 >> pipeline).execute()

will.schroeder · January 18, 2024, 11:24am

What happens with filter with multiple inputs and/or outputs? How does that look?

will.schroeder · January 18, 2024, 12:28pm

And I agree with Ben about using pipe | symbol… does it make sense to consider a “tee” for branching pipelines?

As far as named pipelines: would it be possible to expose just a few key data members in the pipeline (which are a subset of all the attributes available from the filters composing the pipeline). That way “superobjects” / pipelines could be created and manipulated etc.

berk.geveci · January 18, 2024, 12:54pm

Is the pipeline object a new type? Meaning could we do things like that with it?

This is a very nice idea but unfortunately, it is not feasible with our current limitations: The implementation has to be done with the wrapper code (not in Python) because it is not possible to inject Python specific functionality into a class (vtkAlgorithm) any other way. So this implementation would require adding a new type into the wrappers, written in C, to be returned from the operator.

berk.geveci · January 18, 2024, 12:59pm

Other than that, I feel like | is the more “standard” operator for this, but maybe that’s just a C++ perspective (cf. ranges pipelines). I don’t see a meaningful precedence difference unless & is intended to be used for multi-port connections (and would only change where the parentheses need to go when combining them).

I love |. Many of us use it day to day, much more often than the >> (stream) operator in C++.

Can a named method be added too?

Yes. We can do that easily in Python. Something like:

connect(source, filter)

We already have this:

filter.SetInputConnection(source)

which we can change to return the source for chaining.

nicolas.vuaille · January 18, 2024, 1:05pm

The properties part looks really nice!

While it may be nice to easily define pipelines, I wonder if >> or | really are pythonic in some way, so it still will be a “custom VTK stuff”. But I’m not a day to day python user, so I defer to them

Also I’m not sure of how execute() is different than the existing Update() ?

berk.geveci · January 18, 2024, 1:08pm

What happens with filter with multiple inputs and/or outputs? How does that look?

For multiple outputs and inputs, I am thinking of this:

Output(aSource, 1) | aFilter
aSource | Input(aFilter, 1)

Pipeline splits is a bit tricky. We can use something like “tee” but it would be a bit awkward visually:

aSource | tee(filter1a | filter1b, filter2a | filter2b) # what does this return?

Similar with merges. Without any of these, it would look like this:

# Spit
aSource | filter1a | filter1b
aSource | filter2a | filter2b

# Merge
aSource | Input(filter, 0)
bSource | Input(filter, 1)

I didn’t think of append until now. Maybe something like this?

aSource | AddInput(append)
bSource | AddInput(append)

berk.geveci · January 18, 2024, 1:16pm

While it may be nice to easily define pipelines, I wonder if >> or | really are pythonic in some way, so it still will be a “custom VTK stuff”. But I’m not a day to day python user, so I defer to them

I have seen the use of custom operators. Examples: pyTorch, numpy etc added the @ operator for matrix multiplication (which took me a while to figure out!), Fenics uses the << operator to output vis data. I believe what is pythonic about it is using a shorthand to perform an operation. A common way of doing this is chaining operations like

source.filtera().filterb()

which is also not original to Python and is not feasible in our situation (one would have to add > 1000 methods for each filter).

Also I’m not sure of how execute() is different than the existing Update() ?

We can certainly extend Update() for this. The difference is that Update() does not return the output nor does it take any input. In the execute() method, I make a shallow copy of the output and return a smart pointer to it (to handle memory management).

output_data = aReader.execute()
# vs
aReader.Update()
output_data = aReader.GetOutput()

output_data = aFilter.execute(input_data)
#vs
aFilter.SetInputData(input_data)
aFilter.Update()
output_data = aFilter.GetOutput()

dwg · January 18, 2024, 2:53pm

I love this. Two thoughts:

For the CamelCase/snake_case property renaming, I think e.g. vtkConnectivityFilter.n_connected_regions is slightly more pythonic than .number_of_connected_regions, although keeping the parallel to C++ might outweigh that
For enum valued properties like vtkConnectivityFilter.extraction_mode, it’d be nice to be able to use an actual python enum rather than a bare int, especially if the enum were available without an additional import. If the python properties were implemented as a descriptor, it should be possible to do something like

connectivity = vtkConnectivityFilter(
    extraction_mode=vtkConnectivityFilter.extraction_mode.POINT_SEEDED_REGIONS
)

berk.geveci · January 18, 2024, 4:07pm

For the CamelCase/snake_case property renaming, I think e.g. vtkConnectivityFilter.n_connected_regions is slightly more pythonic than .number_of_connected_regions, although keeping the parallel to C++ might outweigh that

We can probably make a limited number of such changes. If the community can make suggestions, we can look into it. Also, if these changes are only a few and on “leaf node” classes, it is possible to use the @override mechanism to add a replacement for the class. I’ll describe this in the future.

For enum valued properties like vtkConnectivityFilter.extraction_mode , it’d be nice to be able to use an actual python enum rather than a bare int, especially if the enum were available without an additional import.

This is something that should be done at the C++ level and automatically handled by the wrappers. If those constants were done as enums inside the class, the wrappers would handle adding them to the class. In terms of returning them from a function, we need to have the right signature and not return ints. Otherwise, there is no way to automatically find it.

Sebastien_Jourdain · January 18, 2024, 7:57pm

Good point but it would be nice if we could have a user_data space for tagging things and maybe down the road create a pipeline object on the C++ side which could leverage some of those annotations.

amaclean · January 23, 2024, 3:57am

With the advent of the new improved VTK Python interface we will need examples.

These will fall into two groups:

New Examples.
Rewriting existing examples.

These examples should be separate from the existing Python examples in the VTK Examples. In the case of rewritten examples this will let the user easily toggle between the updated and original example.

So I have created a branch which does this and hopefully makes it simple for users to upgrade existing examples.

To do this, I had to:

Create a new folder called example: src/Python1
Create a new folder: src/Testing/Baseline/Python1
Add src/Python1.md to the top level
Edit src/Admin/ScrapeRepo.py to add in the new Python folder.
Edit src/Admin/VTKClassesUsedInExamples.py to add in the new Python folder.
Edit src/SyncSiteWithRepo.sh to include src/Testing/Baseline/Python1.

Please have a look at web-test/site/Python1

In this page there are instructions on how to upgrade existing examples.

If you click on the Hello World example you will see, in the Other Languages section, a link to Python. Clicking on it will take you to the original example. Clicking on the Python1 link there will take you back to the re-worked example.

If you don’t like Python1 as the folder name, please select a better name, but keep it short.

If people think this will a good approach, I’ll do a MR. I know that the work here is not finished but by having a place to put the examples, people may like to rework existing examples so we can see what the code looks like!

ben.boeckel · January 23, 2024, 11:44am

Python1 feels confusing given Python2 and Python3. How about PyCxxAPI because it is basically the C++ API in the Python language?

berk.geveci · January 23, 2024, 1:49pm

Andrew you are the best! How about we call it PythonicAPI. By the way, there is still more work before we should actually start creating examples. Especially on the data API side. That will be done more manually rather than by changing the APIs but should make things even nicer.

ben.boeckel · January 23, 2024, 4:45pm

For directories for the new API, that sounds better (it seemed to me like the directory was for old-style examples).

berk.geveci · January 23, 2024, 4:55pm

For directories for the new API, that sounds better (it seemed to me like the directory was for old-style examples).

Ahh I missed that.