Why is vtkmodules.util.data_model overriding by default?

banesullivan · November 27, 2024, 6:14am

In VTK 9.4, it looks this the new Python wrappings originally proposed in More pythonic interface / closer to numpy are now the default imposed on all Python VTK users:

>>> from vtkmodules.vtkCommonDataModel import vtkPolyData
>>> vtkPolyData()
<PolyData(0x609f2ae8b6b0) at 0x7c60007dbd00>
>>> p = vtkPolyData()
>>> p.__module__
'vtkmodules.util.data_model'

I’m curious if this decision was intentional and if someone (cc @berk.geveci) could speak to the benefits of this new API and the reasoning behind opting all VTK Python users into this new API. It seems to have been done intentionally here I believe.

This decision is causing quite a few headaches for downstream projects like PyVista as it is a breaking change. To mitigate this, we’re just going to have to forcibly override to PyVista classes but I’d really like to prevent a scenario where this new module in VTK and PyVista are competing to override the users environment.

Christos_Tsolakis · November 27, 2024, 1:02pm

cc: @jaswantp

dgobbi · November 27, 2024, 2:12pm

The behavior is optional. So even though it is the default, it can be turned off (though the mechanism to do so is undocumented and not very clean):

>>> import vtkmodules
>>> vtkmodules.MODULE_MAPPER
{'vtkCommonDataModel': ['vtkmodules.util.data_model'], 'vtkCommonExecutionModel': ['vtkmodules.util.execution_model']}
>>> vtkmodules.MODULE_MAPPER = {}
>>> from vtkmodules.vtkCommonDataModel import vtkPolyData
>>> vtkPolyData.__module__
'vtkmodules.vtkCommonDataModel'

There is a risk that as time goes on, more and more of the VTK support modules will rely on these default overrides, essentially making them non-optional.

banesullivan · November 27, 2024, 4:11pm

It would have been a wonderful success story of open source if VTK embraced community efforts like PyVista’s years of development on this exact thing instead of starting something from scratch and breaking said community efforts. It makes me sad (and a bit frustrated), but I guess I get it… PyVista is “PyVista” not “VTK”…

Sebastien_Jourdain · November 27, 2024, 10:50pm

@banesullivan if you recall we tried to extract the data model from PyVista and make it into VTK core. But that didn’t work out as PyVista was doing more than just dealing with Numpy. The original idea was to enable PyVista to focus on its plotting API and port back the data model to the VTK side.

With the updated wrapper, the only thing that is missing is the numpy handling (but only when numpy is available). If those are incomplete or missing something, people can still contribute to them. But those Python add-on should remain tiny as they should just focus on the numpy/array mapping at the VTK core side.

Same goes with the undocumented register_vtk_module_dependencies which we could add its counter part to unregister_vtk_module_dependencies or clear_ vtkmodule_dependencies. But for now the idea was to allow the community to add their own override if they wanted, rather than removing the “VTK” ones.

Anyway, I’m expecting some transition happening with 9.4 but ideally the migration toward a single data model between PyVista and VTK could truly happen for 9.5?

akaszynski · December 2, 2024, 5:02pm

Anyway, I’m expecting some transition happening with 9.4 but ideally the migration toward a single data model between PyVista and VTK could truly happen for 9.5?

I’d love to see that!

banesullivan · December 2, 2024, 7:16pm

This concerns me and makes me weary of this new override mechanism, though I think there is a path here where we can cross pollinate and make these support modules robust and flexible so that community projects like PyVista and direct efforts from VTK can thrive together.

Right, PyVista’s core API is handling quite a lot of things in addition to just wrapping the data model: supporting FileIO, type conversions, filters, etc. – all things that others may want to do differently, such as with a full on VTK pipeline (PyVista’s core API only chains operations together as of today). PyVista’s data model is tightly coupled to those additional features making PyVista incredibly flexible and powerful yet making that migration back to VTK tough and probably not worth our time. So yes, I understand why this new data model in VTK is being pushed, but my wishful, starry-eyed part still hopes for more direct embrace of PyVista as it does indeed solve all of the original goals

I’d love to see us converge and for much of PyVista’s core API “ported back to VTK” too, but I’m skeptical that upstream in VTK is the best place for this. I want to advocate that VTK consider hosting the Python wrappings in a standalone repository and as a separately installable package. The barrier to contributing to VTK’s repository is high and the release cycles are not flexible enough for what is required to iterate on this new data model to get it to a point where it is stable enough to encourage wide adoption. And to paraphrase someone from the PyVista community recently, we’re happy to contribute to this new Python API and volunteer our efforts (free labour) but we need some give and take here and to reduce the barriers to contributions.

The reality is that this new data model API, while very promising and deserves a chance to prove itself, is very new, unproven, and showing shortcomings that are affecting downstream project(s). For these reasons, I’d like to strongly advocate that VTK change these overrides to be opt-in for now. Then if a downstream Python packages want’s to opt in, they could either have a simple import to trigger this or we could make a mechanism directly in VTK so that downstream packages could register an entrypoint for these overrides.

If the compromise is to use unregister_vtk_module_dependencies in PyVista, then I think that is fine, but I’d like to see assurances (in the form of tests in VTK’s code) that this new data model isn’t required to use VTK in Python until it reaches a level of stability we are all comfortable with (perhaps these tests exists?)

I too am excited for this prospect and I truly believe this is in reach, though I think it will require a lot of coordination and a champion to lead this, however, I’m afraid there may not be anyone from PyVista with the capacity to take this on at this time.

Sebastien_Jourdain · December 2, 2024, 11:08pm

I’d love to see us converge and for much of PyVista’s core API “ported back to VTK” too, but I’m skeptical that upstream in VTK is the best place for this.

Well that is where we are not necessary seeing the same level of features into that core layer. Like you mentioned before, PyVista is doing way more (io, convert, …). To me what VTK should be able to provide by default is just numpy/eq as this should be small and robust enough to be used across projects. The code written should also remain simple and tiny with the goal of having the wrapper managing most of all.

Since the override is flexible enough, we could create a simple Python package that could reflect what will be pushed into the next release of VTK but allow folks to update it at their own pace between releases. But ideally, this should stabilize pretty fast as the core API hasn’t changed in decades.

I would like to hear what @berk.geveci thinks about opt-in vs opt-out along with that “vtk_future” python package.

berk.geveci · December 3, 2024, 12:24am

First, I apologize for being late to this thread. I was away for the last 3 weeks for personal reasons. There are a lot of issues to address here so I may miss some in my first response.

I’d love to see us converge and for much of PyVista’s core API “ported back to VTK” too, but I’m skeptical that upstream in VTK is the best place for this. I want to advocate that VTK consider hosting the Python wrappings in a standalone repository and as a separately installable package.

This is a fair ask. I will consult with some folks on how this can be achieved. We have a third party import mechanism with which an external package can be brought into VTK as read only. We may be able to use that to move the development of the Python only part of the interface to its own repo. However, before we do this, I would like to have an idea of how much new contribution it would facilitate. How many PyVista developers regularly work on the data model and are willing to contribute to such a module?

The reality is that this new data model API, while very promising and deserves a chance to prove itself, is very new, unproven, and showing shortcomings that are affecting downstream project(s). For these reasons, I’d like to strongly advocate that VTK change these overrides to be opt-in for now.

Hmmm. I would be OK with making it opt-in in the current release but is it feasible to do after we already made a release? Maybe pretend that it was unintentional (it was not) and change it in the next patch release. However, I am not OK with making it opt-in longer than that. This code is based on the numpy_interface module that predates both PyVista and Vedo so it is not new. The fact that numpy_interface has not gained much popularity and external data APIs such as the ones in PyVista and Vedo were developed indicates that opt-in APIs are rarely adopted.

dgobbi · December 3, 2024, 5:13am

I’d just like to add my two cents as a long-time member of the VTK community.

I’m a big fan of optional things. I prefer to take a salad-bar approach to project development, look for pieces here and there that fit the requirements and get them to work together. I’ve rarely built projects solely from VTK classes, and often replace huge parts of VTK functionality with my own code.

The parts of VTK that I love best are the ones that allow customization and flexibility. Like the Python buffer interface for VTK arrays. The vtkGenericOpenGLRenderWindow. The vtkImageImport class. And the Python class override mechanism, too. These are things that make it possible build upon VTK and to tie VTK to non-VTK code.

Needless to say, I don’t like the idea of forcing VTK users to use a specific API to interface with numpy or anything else. To me, it’s equivalent to being told that I have to use VTKs built-in DICOM reader instead of writing my own, and for the other developers to tell me “Oh, don’t worry, we’ll make VTK’s DICOM reader better, I’m sure you’ll grow to like it!” Ugh. With most parts of VTK, like the VTK-Qt interface, the readers, the filters, even the mappers, I’m free to disregard them if they don’t suite my tastes or requirements, because I can always roll my own replacements.

It’s also worth repeating that pure Python-based projects (vedo, PyVista, etc.) generally have an advantage of rapid development and relatively low barrier to entry. VTK has a high barrier to entry and slow iterations (for those who aren’t using source builds from the master branch). This, I feel, is why VTK’s numpy interface has had only intermittent development and low adoption.

berk.geveci · December 3, 2024, 3:29pm

Thanks, David. Very insightful.

Needless to say, I don’t like the idea of forcing VTK users to use a specific API to interface with numpy or anything else.

To dig a bit into the specific details of why PyVista broke is that its subclassing is designed such that the attributes that they define do not override the superclass’ attributes. This is due to the use of multiple inheritance and the order of inheritance. The mixins that they are using are inherited from after the VTK classes and hence do not override attributes of the same name. I would argue that this is something that should fixed irrespective of the data model changes. VTK should be allowed to define a property or a method without worrying about conflicts with subclasses. On the other hand, Vedo does not have this problem even though it also overrides methods. So I don’t believe that VTK is forcing any API on anyone.

In terms of the use of this API spreading, I expect that it will be. There is an algorithms module that depends on this API heavily for example. That’s because it uses numpy and mpi4py to do bunch of things. I don’t think that it is fair to say that VTK cannot use numpy in its implementation.

It’s also worth repeating that pure Python-based projects (vedo, PyVista, etc.) generally have an advantage of rapid development and relatively low barrier to entry. VTK has a high barrier to entry and slow iterations (for those who aren’t using source builds from the master branch). This, I feel, is why VTK’s numpy interface has had only intermittent development and low adoption.

+1