VTK review process documentation

ben.boeckel · September 14, 2021, 1:20pm

Hi all,

I’ve pushed an MR which covers some of the things which should be considered for VTK contributions.

I know that Python guidelines are missing, but I don’t do much VTK Python development, so suggestions for what to add beyond “use PEP8” would be appreciated.

Does everything else make sense? Anything missing?

Once there’s been some discussion about the backing document, I’d like to create an MR template that contains a checklist with links to the various sections for developers and reviewers alike.

Thanks,

–Ben

dgobbi · September 14, 2021, 2:16pm

Regarding Python, I don’t think there’s a consensus yet on the best style for importing modules. Or at least, there’s a lot of variability in how ‘vtkmodules’ is used within the VTK code base.

I like doing the imports like this, which is reminiscent of C++ includes and also very similar to the way a lot of Python-Qt code is written:

from vtkmodules.vtkCommonCore import vtkCommand
from vtkmodules.vtkFiltersSources import vtkConeSource
from vtkmodules.vtkRenderingCore import (
  vtkActor
  vtkDataSetMapper
  vtkRenderer
  vtkRenderWindow
  vtkRenderWindowInteractor
)

It’s tedious to do by hand, though, and I’m reluctant to enforce it until we have a script to automate it so that it can be checked by gitlab and applied as part of “Do: reformat”.

ben.boeckel · September 14, 2021, 2:25pm

Are you talking about the style of the imports or automatically fixing from vtk import vtkClass to from vtkmodules.vtkFooMod import vtkClass?

Either way, removing import vtk from tests is this issue.

dgobbi · September 14, 2021, 2:44pm

Both. We need a consensus on what style to use when importing from vtkmodules, and I think that will be easy to get. And then we (maybe me?) can create a script to automatically replace “import vtk” (and anything else) with whatever we have agreed upon.

Currently VTK has around 800 Python tests, and only 12 of them use vtkmodules.

Edit: I should add, really there isn’t much variation in the styles used for vtkmodules. It really fits into three categories:

""" Style 1: classical style """
import vtkmodules

""" Style 2: numpy style """
import vtkmodules.vtkCommonDataModel as dm

""" Style 3: PyQT style """
from vtkmodules.vtkCommonDataModel import vtkImageData

ben.boeckel · September 14, 2021, 3:58pm

I don’t think that Style 1 works since submodules must be explicitly imported. Maybe there’s lazy importing occurring though?

Personally, I think Style 3 is better because agreeing on shortnames for Style 2 is another bikeshed.

And that’s why an issue was opened rather than it “just being fixed”. Note that doing this will help with moving more TEST_DEPENDS to TEST_OPTIONAL_DEPENDS. It’s kind of silly that if Python is disabled, a module only required by Python tests ends up disabling C++ tests for the module as well.

amaclean · September 15, 2021, 8:17am

I have a very old MR https://gitlab.kitware.com/vtk/vtk/-/merge_requests/7232 that looks through C++ code finding the needed modules for C++. I think this could be easily modified for Python code to use style 3.You could do initial development using import vtk and then run the code through this to convert it to sytle 3. Just a thought.

amaclean · September 16, 2021, 6:59am

Ok - I just wrote this, it uses modules.json and tested it on a fairly complex example.

The only issue I found was VTK_VERSION_NUMBER which is available when import vtk is used is not present at all. So it is wrapped but where is it?

For VTK_DOUBLE, I had to import vtk.util.vtkConstants as vtk_const.
So this script produces style 3 referred to above.

Note, I wrote this in Windows so the line endings are CRLF.
FindNeededPythonModules.zip (1.6 KB)

If it is what you were thinking of, I’ll do a MR tomorrow. If so, tell me where you would like it and I’ll also do a few minor changes. Maybe a name change also to say “FindPythonModules.py”? Suggestions welcome.

dgobbi · September 16, 2021, 1:28pm

VTK_VERSION_NUMBER is defined in vtkVersionMacros.h. All the constants are automatically extracted from the VTK header files by the wrappers, and placed directly into the module namespace, e.g.

vtkmodules.vtkCommonCore.VTK_DOUBLE

Hence vtkConstants.py is really just a legacy file, it has only been kept so as to avoid breaking old code that uses it. Perhaps we should officially deprecate it and add a warning before the next release.

The situation with these constants is tricky, there’s nothing equivalent to modules.json to tell us what module they are defined in. We might have to generate a new file to keep track of them. But that might only be possible at build time (since it involves parsing the headers), not at configuration time.

There is a similar difficulty with e.g. vtkmodules.vtkCommonDataModel.vtkVector3f. This class appears in vtkCommonDataModel-hierarchy.txt but is absent from modules.json.

The difficulty is that modules.json only keeps track of what headers are in what modules, it doesn’t tell us what classes and what constants are defined in the headers. The “one class per header” rule is often broken in VTK.

Edit:

It might be best to use the ground truth info stored within the wrappers themselves,

import importlib
import vtkmodules

modules = [ 'vtkCommonCore', 'vtkCommonDataModel', ...]
for module in modules:
    module_dict = importlib.import_module('vtkmodules.' + module).__dict__
    for name in module_dict:
         name_to_module[name] = module

Importing the wrappers isn’t the best solution for gitlab/kwrobot checks, but it should work fine for converting the tests and examples.

amaclean · September 17, 2021, 7:17am

@dgobbi I have reworked it thanks to your really helpful comments. I now use your suggestion above and just use the JSON file to get the VTK modules to use. Your suggestion has a nice side effect in that the VTK constants are listed in the module so I also picked them up when scanning the Python code. I finally know where these constants hide!
e.g.

from vtkmodules.vtkCommonCore import (
    VTK_UNSIGNED_CHAR,
    vtkDoubleArray
    )

The program can also scan whole folders.

It is now in the VTK Examples, please see: VTKImportsForPython.

amaclean · September 17, 2021, 9:13am

I just had a thought, I’m away from my computer and possibly out of my depth! But… if I:

import vtk

Is it possible to get a list of all the imported modules e.g vtkCommonCore etc? This would mean I don’t have to use modules.json.

ben.boeckel · September 17, 2021, 12:37pm

It doesn’t look easy. You might be able to read the source of vtkmodules.all from its __path__ once you import it though?

dgobbi · September 17, 2021, 12:44pm

@amaclean I have an MR (!8419) to add a vtkmodules.__all__ attribute that lists the modules. So after it’s merged it will be possible to get the list from vtkmodules.

It’s possible to get the list from either the ‘vtk’ module or the 'vtkmodules.all’ module by brute force by iterating through all the VTK classes, since each class has a ‘__module__’ attribute. The results can be put in a Python set for uniqueness.

@ben.boeckel is right that parsing vtkmodules/all.py to get the list would be more efficient.

I’ve tried your VTKImportsForPython.py utility on a few files and it works well. If I manage to break it I’ll let you know.

Edit: In fact, maybe a list of modules and a name_to_module dict are unnecessary, since the following method can be used to map a classname to a module:

>>> import vtk
>>> vtk.__dict__['vtkObject'].__module__
'vtkmodules.vtkCommonCore'

Edit 2: Oops, that doesn’t work for constants (e.g. VTK_FLOAT) since they lack a __module__ attribute. So a full list of modules is, indeed, necessary. Sorry for the noise.

amaclean · September 17, 2021, 11:07pm

@dgobbi I’m glad it works for you. I tested it out on the vtk-exampes.
It has now been updated to use the changes in your merged MR: 8419 . I have kept the option to use modules.json for older versions of VTK. Thanks for your help and comments.

I’ll be using it to update the python examples in VTK Examples.

estan · September 18, 2021, 8:39am

Great initiative @ben.boeckel.

On this topic, there’s one thing I think could be clarified further in the process documentation (e.g. in develop.md). Currently it says at https://github.com/Kitware/VTK/blob/master/Documentation/dev/git/develop.md#testing :

[…] Merge request authors should visit their merge request’s pipeline and click the “Play” button on one or more jobs manually. […]

But it does not say whether a fully green pipeline is a prerequisite for merge. And the way the above is phrased (“should”), without giving any details on which jobs should be triggered, it’s not easy for a newcomer to know what they must do.

Now, it’s easy enough to visit https://gitlab.kitware.com/vtk/vtk/-/merge_requests?scope=all&state=merged and have a looked at merged MRs to see that a fully green pipeline is not a prerequisite for a merge.

Recently, I had a colleague making his first MR against VTK and he asked me about this, and I couldn’t really answer. I went ahead and actually made his pipeline fully green, but it was a game of whack-a-mole taking a whole day to get it there. Most of the failures seemed to be about tests timing out. At one point it was close, but then so much time had elapsed that build artifacts from the build phase which were a prerequisite for the timing out test job had been removed, so had to go back and re-trigger the build. Finally we got there though, but this is not an ideal process for people who (naturally) think that the pipeline must be green for their MR to be merged.

Should some of the timeout limits be increased perhaps?

Best and easiest of course would be if a fully green MR is a prerequisite for merge, like in most other projects.

estan · September 18, 2021, 8:48am

For reference, this was the MR: https://gitlab.kitware.com/vtk/vtk/-/merge_requests/8251

As you can see, we got the pipeline green prior to merge. But the tests of course failed again when the merge commit was tested, probably due to timeouts.

Looking back at some other MRs, I’m actually not sure what’s going on normally. Take for example https://gitlab.kitware.com/vtk/vtk/-/merge_requests/8420. In that MR, I can’t see any Do: test commands, nor is the full GitLab pipeline green, but despite this @dgobbi said “Dashboards green” and went ahead with the merge. Is there some unwritten rule here?

This is what it normally looks like in the list of merged MRs:

I.e. most MRs either have a failing pipeline, or the pipeline is waiting for manual action. Very rarely do you see a green checkmark there.

ben.boeckel · September 18, 2021, 12:34pm

I agree, however, with the flakiness of some platforms (e.g., Windows file removal issues), the time some jobs take (clang-tidy), and lack of sufficient resources to work through the workload (macOS), requiring fully green pipelines will significantly decrease throughput since it takes about 90 minutes if no other work is scheduled for a pipeline to even run completely.

As for the current pipeline state, I recently went through and updated the test exclusion lists, but that could certainly be wrong with transient failures (such as timeouts or machine issues). Ideally, we’d get tests to be stable on their own, but that requires some development time beyond “ignore the test completely in CI”.

About as far as we’ve considered is requiring a green build, but this is also problematic because of the time lag it would require on things (basically, even quick fixes would take “all day” to end up with green builds across the board).

will.schroeder · September 18, 2021, 12:47pm

Ben +1 thanks for your pragmatic take on this. As you say, dealing with the flakiness in testing can do major damage to getting code out the door in a timely manner. Of course, too much pragmatism transform into ignoring the dashboard, but I know we are better than that

dgobbi · September 18, 2021, 1:02pm

One key takeaway I get from @estan’s comments are that the problems with CI and lack of clarity in the VTK process are barriers to community participation. Of course the aim of this thread is to clarify the process, so so the comments are all good, and so is the work on improving the documentation.

In my own defence, the dashboard results I called “all green” were as green as is possible with the current CI (though, of course, I only ran a couple jobs).

will.schroeder · September 18, 2021, 1:23pm

problems with CI and lack of clarity in the VTK process are barriers to community participation

I’m glad you mentioned this David, building out the community even more is something that will be good for all of us.

estan · September 18, 2021, 1:44pm

Thanks guys, I understand it’s a question of resources. Right now there’s a smorgasbord of jobs to choose from when choosing what to trigger for your own MR. Just some pointers in the docs on what is the minimum/typical ones that should be triggered/green would go a long way for newcomers I think.