Proposal: implementing fast data set topology statistics

VTK has an abstract method vtkIdType vtkDataObject::GetNumberOfElements(int type) to get the number of points, cells, etc. It has a very fast implementation for any particular data object (vtkPolyData, vtkRectilinearGrid, vtkGraph, etc.). It would be great to have a basic abstract method vtkIdType vtkDataSet::GetNumberOfCells(int cellType) to retrieve the number of cells with certain cell type (same integer as for vtkCell).
Then one could write conditions like

if(pDataSet->GetNumberOfCells(VTK_VERTEX)==pDataSet->GetNumberOfCells()) { /* point cloud */ }
else if(pDataSet->GetNumberOfCells(VTK_TRIANGLE)==pDataSet->GetNumberOfCells()) { /* 2D triangle-only mesh */ }
else if(pDataSet->GetNumberOfCells(VTK_QUAD)==pDataSet->GetNumberOfCells()) { /* 2D quad-only mesh */ }
else if(pDataSet->GetNumberOfCells(VTK_TETRA)==pDataSet->GetNumberOfCells()) { /* 3D tetrahedra-only mesh */ }
else if(pDataSet->GetNumberOfCells(VTK_HEXAHEDRON)==pDataSet->GetNumberOfCells()) { /* 3D hexahedron-only mesh */ }

These conditions are implicitly required for many VTK filters (some of them are for triangle mehses only).
This new “GetNumberOfCells(int)” method is only worth adding if it will have a non-iterative implementation in derived classes. It is a non-trivial problem only for vtkUnstructuredGrid and vtkPolyData classes. For structured data sets, the implementation will simply perform some global dimension checks (for vtkRectilinearGrid, vtkImageData and so on) and a GetNumberOfCells() call.
For vtkUnstructuredGrid and vtkPolyData classes I suggest the following:

  1. store cell vertex counts vtkIdType CellCount[VTK_NUMBER_OF_CELL_TYPES] for certain cell type;
  2. increment and decrement CellCount[] in the implementation of data objects’ cells construction and addition;
  3. implement GetNumberOfCells(int cellType) as { return CellCount[cellType]; }.

This implementation will slightly decrease performance of vtkUnstructuredGrid and vtkPolyData in data operations but the benefits would be great. Then we could distinguish pure and mixed meshes with vertices, polyvertices, lines, polylines, triangles, quads, wedges, hexahedrons, etc. and add simple and fast checks in many VTK filters to increase stability.

I don’t really see how this could be helpful for filters to know in constant time how many cells of which types there are. You’d still need to iterate over every cells and check which type each cell is, which is equivalent complexity-wise to recomputing how many cells there are of each type.

If you want to not run a pipeline if some cell is absent, vtkDataSet has GetCellTypes API up to 9.1, which is replaced by GetDistinctCellTypesArray in later versions (I have an merge request finishing the transition), which lists all cells that are present in the data set, but doesn’t tell how many.

Yes, I know about GetCellTypes and GetDistinctCellTypesArray, but both functions use iteration through all cells.
Some VTK filters (e.g. vtkDataSetSurfaceFilter and vtkSubdivideTetra) use int vtkUnstructuredGridBase::IsHomogeneous() method to determine if cells are all of the same type. The function IsHomogeneous itself also uses iteration each time the filter is executed.
The idea is to track cell types with their counts during grid construction and modification, providing constant time for execution of functions like IsHomogeneous. The minimal primitive and undesired implementation of this functionality is remembering the return value of IsHomogeneous function.
The accurate and deep implementation will need to modify InsertNextCell, ReplaceCell functions.

Yes, I know about GetCellTypes and GetDistinctCellTypesArray, but both functions use iteration through all cells.

So this could be fixed by explicitly writing a GetDistincCellTypesArray in each subclass of vtkDataSet? And IsHomogeneous could use the distinct cell array if present, and iterate if not?

Updating this cell count at each cell insertion or when setting a vtkCellArray is the same complexity as to do it once when all cells are set: it is O(n), where n is the number of cells. It’s not constant time when you do it at insertion, because you’re doing that for each cell. It’s just better to care about homogeneity when needed, and not to compute it all the time.

No, I didn’t mean transferring functionality from vtkUnstructuredGrid to all of the vtkDataSet classes. Current implementation is enough. I just wanted to speed up filters and prevent them from execution when mesh is inappropriate.
I had an assumption that in many cases the data changes rarely, and the filters working with them are executed frequently. Maybe manual caching of some other integral cell information will be useful for unstructured grids.

vtkDataSet has a GetCellTypes API which will soon be turned into a GetDistinctCellsTypesArray. Information about cells is cached using this API in vtkUnstructuredGrid. When you query this array, it gets generated and saved, and next time you need it, it is there and is not recomputed. IsHomogeneous could be sped up by relying on this computed array. The only difference with what you were talking about is that it doesn’t store the number of cells, but whether their presence in the data set.

Ok, I see that GetDistinctCellsTypesArray really does caching, but IsHomogeneous currently uses old cell iteration. I will close the proposal and will wait for the new API changes.