New `ProcessIds` data attribute for distributed algorithms

jfausty · May 17, 2023, 2:39pm

Do you intend to make vtkGhostCellsGenerator use this array if present to move around cell and point data? Or do you intend to use it within a few filters that might require synchronization, and just write a utility synchronizing ghosts? (It should possibly also update the point positions if a filter moves points upstream, assuming that the topology hasn’t changed).

The idea is definitely to have this array generated when generating ghosts so as to make data synchronization across processes more efficient when needed. So while perhaps not being a direct input to the vtkGhostCellGenerator (since if we don’t have ghosts, we don’t need this array and if we have ghosts, we don’t need to generate them) it will definitely be useful for filters in need of synchronization down the line. As for the exact software architecture to associate to the synchronization routine, we haven’t started designing it yet but any suggestions are welcome!

Ghosts can be computed within one process if it has vtkPartitionedDataSet. Ghosts will be generated across partitions. Do you think it is not worth holding this kind of information because we are within the same rank and won’t need MPI to synchronize, or would knowing that partition id (which could be global) help speed up the kind of filters you had in mind?

This is a great remark. I think that we should generally treat the partitions on-rank and the partitions off-rank the same way to avoid breaking the vtkPartionnedDataSet concept. To this end I would advocate for generating ghost information even between partitions on the same rank even if this might lead to worse performance. This should simplify the distributed algorithms at first. If we identify any bottlenecks, we should be able to take this information into account. The workaround of using an append filter on each rank (such as what is done in the vtkRedistributeDataSetFilter) is also a good practice for this problem as long as the on-rank partitioning doesn’t have any semantic meaning.