Currently, there is no way to skip values inside a vtkDataArray when fetching its range using vtkDataArray::GetRange. A reason someone would want to skip values is when the data array is tied to some data set that holds ghost cells or ghost points. Ghost cells are typically not displayed and it makes sense to want to get the range of visible fields rather than what is actually stored in memory.
I am currently enhancing vtkDataArray so when it computes the range of the array, it can be provided with a ghost array and a set of values to skip. In order to do that, I’m adding a vtkUnsignedCharArray* instance inside vtkDataArray, called GhostArray, which the user can provide. Another new API called GhostTypesToSkip is also added. Any values in GhostArray that intersect GhostTypesToSkip in binary (i.e. a binary AND operation between the 2 values doesn’t return 0) are skipped when computing the range.
In order to make the behavior kind of automatic, vtkFieldData will automatically assign a ghost array when added, and the relevant GhostTypesToSkip. For vtkCellData, ghost types of skip are set to vtkDataSetAttributes::DUPLICATECELL, and for vtkPointData, vtkDataSetAttributes::DUPLICATEPOINT.
I’m foreseeing an issue for ghost points. Points that are interfacing another partition can be marked ghosts in one partition, but not in the other. I feel like such points should be counted for the range. Points that are ghost and not interfacing another partition should definitively be skipped though. So maybe a solution would be to introduce a new point ghost type, such as vtkDataSetAttributes::INTERFACINGPOINT? Ghost points interfacing another partition would be marked both DUPLICATEPOINT and INTERFACINGPOINT. This would mean that we would need to add a GhostTypesToKeep in vtkDataArray so interfacing points, which are also duplicate, don’t get skipped.
Any red flags? Opinions? Is is sensitive to add a new ghost point type?
I don’t really follow. So I’ll write what I think If two or more datasets are sharing a point, that point should be marked DUPLICATEPOINT for all but one of those datasets. That way, it will get counted only once in statistics etc. I can’t see how a point can be a DUPLICATEPOINT beyond being shared this way. Any other type of ghost would fall into another category such as HIDDENPOINT. What am I missing?
I believe the current bit patterns were chosen to be compatible to Visit: https://www.kitware.com/ghost-and-blanking-visibility-changes/ - maybe this is not very critical to maintain today, but I believe such compatibility is nice in the long run. Good reasons should thus exist before inventing something new…
What about HIDDENCELL (and HIDDENPOINT)? Any reason not to exclude these also by default? These are very useful in structured grids. The last one used by VTK is REFIENDCELL, any reason not to exclude such by default?
There’s something uneasy about adding a vtkUnsignedCharArray* for GhostCells to vtkDataArray itself. To me, the concept of ghost or not really comes into play once a mesh has been defined and not before then. Also, if adding an array to a vtkFieldData causes this internal ivar to change, we have to be very careful with methods like vtkFieldData::PassData which directly pass input array to the output. Assuming that ghost cells array will be passed too is not reasonable. So you’ll need to create a new instance of vtkFieldData. Also what a filter changes the ghost array, you may end up accidentally changing the ghost array for the input. So adding a ghost array to vtkDataArray is fraught with complications.
Slightly tweaking your design, how about this:
Add GetRange APIs to vtkDataArray that take an optional vtkUnsignedCharArray together with a flag which is used to determine which flags to skip.
Add vtkFieldData::GetRange(const std::string& arrayName). Now, vtkFieldData has access to ghost array (cell or point) when present and it can pass it along with the appropriate flags to ...Array::GetRange. FieldData can also cache the computed ranges (as vtkDataArray does) except this time together with the details about the ghost arrays.
I believe the challenge comes when ghost cells are present too. Let’s take an example: simple case, Wavelet split in to ranks. In this case, if one calls GetRange on each rank for RTData, we expect RTData to include the points marked DUPLICATEPOINT too. Unlike DUPLICATECELL, a duplicate-point, in this case, is indeed rendered on that rank. Now, let’s say we add a layer of ghost cells to each rank. Now we have an issue. Here, on any rank, the RTData range is expected to removed all ghost cells and then compute then range. DUPLICATEPOINT is simply not sufficient. Hence, the suggestion to add INTERFACINGPOINT.
Also, if adding an array to a vtkFieldData causes this internal ivar to change, we have to be very careful with methods like vtkFieldData::PassData which directly pass input array to the output
I didn’t see that one. In order for it to work, the arrays should be shallow-copied in this method. I wonder if not all added arrays to a field data would need a shallow-copy so the input arrays are not edited.
The reason I wanted to integrate this API inside vtkDataArray was to make the change seamless for the user. There would be nothing to do to actually get the ghost aware range. People couldn’t forget to pass a ghost array, or need to use a new vtkFieldData::GetRange in order to have the “correct” range. I believe an automatic shallow-copy when adding an array should make everythinig work without loopholes.
I can’t see how a point can be a DUPLICATEPOINT beyond being shared this way.
Utkarsh summed it up pretty well. You can have duplicate points that are visible (at the interface), and others that are not (the ones generated by the need of new ghost cells). A user would assume to get the visible range, I believe, hence interfacing points could (should?) be DUPLICATEPOINT | INTERFACINGPOINT.
I am not sure I agree. We have explicit API for GetFiniteRange, for example. The default is simply to get the range of the array. Once you start adding qualifiers, such as skip NaN or skip ghosts, I think pushing the burden to the user to make the right call is better. It’s explicit and then the developer can make the choice based on task at hand.
But then it’s also harder for us to not forget an instance where the ghosts are not taken into account. I just did a git grep GetRange\( | grep -v Test an there are 268 lines where GetRange pops up in VTK, 110 in ParaView, and who knows how many in other projects (vtk.js for instance). Maybe I’m wrong to be concerned by that, but it seems like a potential problem in the long run to me.
To me it makes sense to see what’s the intent of the code in each case and then fix it appropriately. If an algorithm was calling GetRange on an array to determine what size of internal data-structure to allocate for its computation, then in such a case, if the range skipped ghost cells, that’d be bad. The whole point of ghost cells is that most algorithms operate on them as if they are just normal cells. We only remove them at the final rendering stage. So yes, when deciding LUT range, we should consider ghost cells…but when deciding what size buffer to allocate, perhaps not.
I believe that we are thinking of this differently. If you have two partitions with a common set of points (no ghost cells), one of these partitions should mark the shared points as duplicate and the other one should not. If you think about it, this is exactly what we do with ghost cells. Then one of these meshes would include these points for range calculation, the other would not. This extends naturally if you add ghost cells. You remove ghost cells. You still have the shared points on the interface. Only one of the partitions counts it for range.
What about hidden points / cells? They are pretty much uninitialized values and can hold any value. Shouldn’t they be skipped by default when getting the range of an array, regardless of what the developer wants to do?