RFC: Toward supporting distribution of VTK based modules on PyPI

In a nutshell, we would like to officially support the ability to package and distribute any VTK based modules on PyPI (Python Package Index).

This means we would maintain a template to allow anyone in the community to do so.

There is of course a lot of technical details we will have to consider but before putting together a technical proposal, we would like to gather input from the community to make sure we consider all aspects.

In recent years, Python has become the most widely used programming language (and in particular, in important areas, such as machine learning [1]). Since numerous Python packages are available on PyPI for all imaginable computational tasks, libraries that are not available on PyPI get simply ignored. Algorithm developers who were happy for many years with just making C++ source code available now need a way to distribute their work on PyPI. The ability to easily build VTK modules against VTK itself and allow anyone in the community to distribute VTK based modules will be key. We think it is currently an unmet need.

Thanks for your input,

J-Christophe, Will, Utkarsh, and Stephen

[1] https://medium.com/analytics-vidhya/machine-learning-technology-trends-2020-2d0212813c05

23 Likes

Generally-speaking, I am in support of this initiative. I’d revise the requirement, however, as “ability to build VTK modules against VTK itself and allow anyone in the community to distribute VTK-based modules”.

When you say a VTK-module, to me it sounds like it’ll be in VTK. So I should able to do from vtkmodules import MyNewModule. I am not sure that needs to be a requirement. Maybe making ability to support import MyNewModule easy is all that’s needed.

This would be incredibly valuable and would open up use of VTK to a much broader community.

1 Like

This would be awesome. It would allow us to make available lots of VTK-based tools and algorithms that we developed over the years to developers/researchers who do most of their work in Python - which are most of the new graduates in our institution (and probably at many other places, too).

I agree that there is a difference between VTK core modules (stored in VTK repository, can only be built as part of VTK) and VTK remote modules (stored in external repositories and can be built against a VTK build tree and maybe even install tree). So, we could say that there are two requirements:

  1. Ability to easily build VTK remote modules against VTK and allow anyone in the community to distribute VTK based remote modules.
  2. Ability to easily build VTK core modules against VTK and allow anyone in the community to distribute VTK based modules.

We all already agree on necessity of requirement 1.

Requirement 2 is needed because there are several non-essential modules in VTK repository that would not be practical to include in the main VTK Python package on PyPI as they would make VTK package bloated with features that very few people need and/or would require special hardware or software configurations to be usable. For example, all the domain-specific modules (Chemistry, ChemistryOpenGL2, Microscopy, ParallelChemistry, GeoVis, …), some of the parallel modules, exotic IO modules, OpenVR, all the GUI support (Qt, MFC, Tk, …) modules should be probably distributed as separate packages on PyPI (e.g., vtk-chemistry, vtk-geovis, vtk-openvr, …).

An alternative solution would be to move out all non-essential VTK modules to remote modules. This would be useful also because it would decrease size, complexity, and maintenance workload of the main VTK repository. Important VTK modules could be still hosted in Kitware’s gitlab and VTK developers could have direct write access to them.

We use VTK in Jupyter Notebooks within in an Anaconda environment on Debian. We will only use VTK modules that are installed via Anaconda’s Conda package manager, and thus are contained within the environment.

This helps us maintain package compatibility within the Anaconda environment, and helps us keep the system-wide Debian clean and minimal.

Thank you,

Ken

Is your point that not just PyPI but conda package manager should be considered? Do you impose any limitations on using conda-forge recipes, building packages created from recipes obtained from PyPI, or installing packages using pip?

The Conda package manager has a default channel, and that it what we use in order to ensure package compatibility; so that is what we would prefer you set as your target. One big downside of this from your perspective is that you have to stay in-line with them as to what versions of dependent packages you use.

I understand that conda-forge and other channels are available, but using these could change (update or downgrade) some existing packages is such a way as to break package-compatibility within the environment… Also, we maintain security by periodically updating our Anaconda environment using Conda’s default channel, so would run the risk of breaking any packages installed from elsewhere (depending on their dependent package versions). So, packages from conda-forge and other channels are only evaluated under duress, and very seldom get deployed.

We do not use pip due to package compatibility concerns.

Aptitude is what we use for maintaining the system-wide Debian (and, again, we restrict ourselves to the Debian-provided repositories). Also, we keep our system-wide Debian installation as small as possible by restricting our application-related software to its environment. This allows us to maintain security by periodically updating our Debian without breaking package-compatibility.

Obviously, we are intent on maintaining system stability; and are willing to sacrifice functionality to do so.

Nobody likes a system that used to work.

Thanks,

Ken

1 Like

Hi @jcfr, sounds great. Could you explain the technicalities a bit?

Do you mean that any python app using VTK with that template would link dynamically to the main VTK cpython library, and would be able to operate with objects from proper VTK python wrappings (import vtk)?

In my recent experience, I have used VTK/ITK internally, and exposed my own wrappings of a few VTK objects when needed. The problem of course is that my_wrapped_vtk_image is not binary compatible with the same VTK object from import vtk. So I have used bridge functions in pure python (usually via numpy) to convert between these.

Just trying to understand better, I like the idea of course.

Any module which requires a build tree and doesn’t work with a build tree is either:

  • using internal APIs (bad module, don’t do that); or
  • a bug in VTK’s install tree

Please file bugs when this happens.

This sounds like a very good line to use for moving some code out of VTK and into its own repository (or multiple!). They’d also serve as an example of “see, here’s how to make a wheel of a VTK-using project”. Actually doing the split is likely to be a huge undertaking (and ParaView will need to support an external VTK first in any case for this to become a practical reality).

(replying to the overall effort here):

I do have issues with wheels in particular because shipping C++ APIs with wheels is a very unsolved problem that I haven’t seen anyone working on it. Conda does support non-Python code and is likely more suitable for a proper solution for this (though if wheels do end up having ways of shipping non-Python APIs reliably, we can look at that again). Things to consider:

  • where do headers live within a wheel?
  • where should the vtk-config.cmake and associated files go?
  • VTK hierarchy files also need to go into the package

Now the new VTK build system should be able to deal with any such layout, but making it Just Work is up to conventions in the wheel-using ecosystem.

2 Likes