vtkPythonAlgorithm : long running tasks and multiple computers ?

Nicholas_Yue · April 18, 2025, 5:29pm

I am revisiting the following vtkPythonAlgorithm aricle from a HPC/cluster/multiple-computer perspective.

I am unsure of the design scope of vtkAlgorithm hence the following question.

I have long-running or compute intensive work load

I have the follow use cases

Note: When I say node, I mean a vtkAlgorithm subclass (c++) or a vtkPythonAlgorithm with python script as a workload

Case 1:

NodeA generates some output that are to be consume by connected child nodes Node[0…10], Node[0…10] are independent nodes doing different things to the data supplied by NodeA

Is it within the design scope of vtkAlgorithm that one might be able to execute the workload of Node[0…10] on different computers ?

Case 2:

NodeB is given a collection of ordered inputs e.g. a collection of files with a known numbering convention. Is there a way for NodeB to run it’s processing code on each of the files on different computers?

Thank you in advanced.

ben.boeckel · June 13, 2025, 10:35am

I think the support would need to be mediated by MPI and node allocations today. @sankhesh or @berk.geveci?

sankhesh · June 14, 2025, 7:00pm

Hi @Nicholas_Yue, these are interesting use-cases.

Case 1:

VTK leverages task parallelism (the processes could be on the same machine or on different machines), using MPI. See https://gitlab.kitware.com/vtk/vtk/-/blob/master/Examples/ParallelProcessing/Generic/Cxx/ParallelIso.cxx. This is a C++ example, but it should work fine using vtk’s python bindings. In that example, the MyMain function is restricted to a single process. Ideally NodeA would be running that single process in your use case. You can then use CreateSubController or just SetMultipleMethod from vtkMPIController to farm out the successive task to all the processes (on the different machines).

Case 2:

Yes, again vtkMPIController provides this logic. As shown in the example above, you write your code once and then call the executable with mpi to run it on all different processes. The ordering logic should be managed in your code based on how you’d like to process which file.