Ghost points and ExtractSurface

cchan · August 4, 2020, 12:18am

Hello!

I’m writing a C++ VTK filter, and I’m trying to understand the idea of ghost points and how multiblock topology works.

My understanding is that in MPI mode, ExtractSurface should be ignoring boundaries between MPI ranks; i.e. it’s not supposed to identify ghost points as surface points, as suggested by the first paragraph of https://blog.kitware.com/ghost-and-blanking-visibility-changes/. I’m currently feeding data into the MPI pipeline split up into XDMF grids through vtkXdmf3Reader (which evenly distributes grids among ranks), and there’s a short data exchange step to establish the ghost points so that we have the missing points for each cell at the boundaries between domains, but ExtractSurface appears to be ignoring the point ghost array and setting a boundary between domains anyway**. Does ExtractSurface support this MPI-agnostic mode of operation? The goal is to check if the boundaries and between ranks are duplicated and flagged with vtkGhostPoints in a way that’s usable by all Paraview filters.

I also have a few followup questions about ghost point semantics:

Is each rank expected to have a copy of all the PointData associated with each of its ghost points?
How many layers of points should be duplicated (I’ve been assuming just 1 layer is enough?), and should I be working with ghost cells at all if I only have PointData?
Should a duplicated point be marked as a ghost point on both (or all) ranks it’s duplicated across, or should I mark it as a “real point” on exactly one rank?
Are ghost points intended to be a communication-free way of sharing data? (no data sharing after the initial partitioning) Or do Paraview filters handle such communication internally?

Thanks!

** Specifically, each rank owns a rectangular-prism-shaped block of points that shares faces with the blocks owned some other ranks. The points on these faces are being copied to adjacent blocks, and being marked as ghost points on those adjacent blocks.

danlipsa · August 4, 2020, 2:57pm

Hello!

I’m writing a C++ VTK filter, and I’m trying to understand the idea of ghost points and how multiblock topology works.

My understanding is that in MPI mode, ExtractSurface should be ignoring boundaries between MPI ranks; i.e. it’s not supposed to identify ghost points as surface points, as suggested by the first paragraph of https://blog.kitware.com/ghost-and-blanking-visibility-changes/. I’m currently feeding data into the MPI pipeline split up into XDMF grids through vtkXdmf3Reader (which evenly distributes grids among ranks), and there’s a short data exchange step to establish the ghost points so that we have the missing points for each cell at the boundaries between domains, but ExtractSurface appears to be ignoring the point ghost array and setting a boundary between domains anyway**.

Can you try with ghost cells instead? I believe ghost cells are more widely used because of difficulties with ghost points that you allude to later on in your email. You can also look in the filter to see what kind of ghosts it uses.

Does ExtractSurface support this MPI-agnostic mode of operation? The goal is to check if the boundaries and between ranks are duplicated and flagged with vtkGhostPoints in a way that’s usable by all Paraview filters.

I also have a few followup questions about ghost point semantics:

Is each rank expected to have a copy of all the PointData associated with each of its ghost points?

This depends on the filter. If the filter needs to do computation you would need the point data needed for the computation. For ExtractSurface you would probably not need any data.

How many layers of points should be duplicated (I’ve been assuming just 1 layer is enough?), and should I be working with ghost cells at all if I only have PointData?

For ExtractSurface one layer should be enough. Yes, I think you should work with ghost cells.

Should a duplicated point be marked as a ghost point on both (or all) ranks it’s duplicated across, or should I mark it as a “real point” on exactly one rank?

Ideally something like this should be done, but I don’t think it is - because it is difficult to do and costly. Working with ghost cells does not result in this kind of complication. Because of this issue, I believe statistics involving points for code run in parallel are incorrect.

Are ghost points intended to be a communication-free way of sharing data? (no data sharing after the initial partitioning) Or do Paraview filters handle such communication internally?

The first statement is true. It should be communication free after the initial partitioning.

cchan · August 11, 2020, 6:37pm

Hey Dan!

Thanks for the response! I was certain I had posted a reply but I checked back just now and there was oddly nothing here - so apologies and here’s a second try!

I’ll definitely try ghost cells when I have a chance! It’ll take a good bit more communication / memory but I can see how ExtractSurface might need that information to determine boundaries.
If we stack filters needing ghost points, do the number of layers of ghost points required add up? (i.e. stacking a derivative filter on a derivative filter should probably result in 2 layers of ghost points/cells, right?) How does this information propagate up the pipeline, if at all?
Regarding “difficult to do and costly” - my data is essentially already formatted this way (exactly one “real” copy of a point across all ranks, any duplicates marked as ghost points) because my custom VTK filter is doing the MPI point exchange. Will statistics work correctly if I follow this assumption?
Should I be marking GlobalIDs? Do any filters use them?

Thanks,
Clive

danlipsa · August 11, 2020, 8:44pm

Hey Dan!

Thanks for the response! I was certain I had posted a reply but I checked back just now and there was oddly nothing here - so apologies and here’s a second try!

I’ll definitely try ghost cells when I have a chance! It’ll take a good bit more communication / memory but I can see how ExtractSurface might need that information to determine boundaries.

If we stack filters needing ghost points, do the number of layers of ghost points required add up? (i.e. stacking a derivative filter on a derivative filter should probably result in 2 layers of ghost points/cells, right?) How does this information propagate up the pipeline, if at all?

Yes, that is correct. Knowing the processing you need will allow you to generate the number of ghost cell layers required for your processing.

Regarding “difficult to do and costly” - my data is essentially already formatted this way (exactly one “real” copy of a point across all ranks, any duplicates marked as ghost points) because my custom VTK filter is doing the MPI point exchange. Will statistics work correctly if I follow this assumption?

Yes, I think it should.

Should I be marking GlobalIDs? Do any filters use them?

Here is a good description of Pedigree IDs and Global IDs

https://vtk.org/pipermail/vtkusers/2009-June/052567.html

An example of usage is when you select cells on an extracted surface of a volume (with the hardware selector), but you want to remove the cells from the volume. In this case you would keep track of the original cells using pedigree ids (or global ids). A grep would show you what filters use them. I would not worry about this unless you know you need it.