vtkCellArray containing pointers to other vtkCellArrays

CsatiZoltan · November 15, 2022, 1:56pm

The mesh data structure at hand comes from a simulation framework, and is large. To avoid creating a copy, I

populate a vtkCellArray object using the SetData method
pass this object to a vtkUnstructuredGrid object using its SetCells method.

This works when the source mesh consists of one cell type (e.g. tetrahedron) only. For hybrid meshes, the simulation data structure stores the different cell types in non-contiguous parts of the memory.
What I want is to construct a vtkUnstructuredGrid from cells that reside in multiple contiguous memory locations. Currently, I

create one vtkCellArray object for each cell type (no copy involved), and put them to a new vtkCellArray object by calling the Append method.
add this global vtkCellArray to the grid with an overloaded version of the SetCells method.

However, appending the global vtkCellArray copies data from the individual vtkCellArrays. Is there a way to tell the global vtkCellArray to just point to the other vtkCellArray objects, without copying their contents?

I added the “python” tag because I work in Python, but a C++ solution is equally good.

Paulo_Carvalho · November 26, 2022, 10:06pm

Hello,

Do all that object copying causing problems? If it’s taking up too much memory, then you can drop the original objects, maybe?

take care,

Paulo

CsatiZoltan · November 27, 2022, 6:14pm

Hi,

Yes, I want to avoid copying because duplicating the simulation data structure (mesh plus the solution defined on it) is a big no for large-scale simulations.
Since I do co-processing, I cannot discard the original objects—the solver needs the mesh for the next time step.

Paulo_Carvalho · November 29, 2022, 6:25pm

Hello,

I don’t use vtkCellArrays and Append() to build the geometry of my vtkUnstructuredGrids. Maybe you’re better off changing the way you build the visualization grid.

Here’s how I do it:

	// Create a VTK container with the points (mesh vertexes)
	vtkSmartPointer< vtkPoints > hexaPoints = vtkSmartPointer< vtkPoints >::New();
	hexaPoints->SetNumberOfPoints( geoGrid->getMeshNumberOfVertexes() );
	for( int i = 0;  i < hexaPoints->GetNumberOfPoints(); ++i ){
		double x, y, z;
		geoGrid->getMeshVertexLocation( i, x, y, z );
		hexaPoints->InsertPoint(i, x, y, z);
	}

	// Create a VTK unstructured grid object (allows faults, erosions, and other geologic discordances )
	vtkSmartPointer<vtkUnstructuredGrid> unstructuredGrid = vtkSmartPointer<vtkUnstructuredGrid>::New();
	uint nCells = geoGrid->getMeshNumberOfCells();
	unstructuredGrid->Allocate( nCells );
    vtkSmartPointer< vtkHexahedron > hexa = vtkSmartPointer< vtkHexahedron >::New();
    for( uint i = 0; i < nCells; ++i ) {
        uint vIds[8];
        geoGrid->getMeshCellDefinition( i, vIds );
        hexa->GetPointIds()->SetId(0, vIds[0]);
        hexa->GetPointIds()->SetId(1, vIds[1]);
        hexa->GetPointIds()->SetId(2, vIds[2]);
        hexa->GetPointIds()->SetId(3, vIds[3]);
        hexa->GetPointIds()->SetId(4, vIds[4]);
        hexa->GetPointIds()->SetId(5, vIds[5]);
        hexa->GetPointIds()->SetId(6, vIds[6]);
        hexa->GetPointIds()->SetId(7, vIds[7]);
        unstructuredGrid->InsertNextCell(hexa->GetCellType(), hexa->GetPointIds());
	}
    unstructuredGrid->SetPoints(hexaPoints);

My grid cells are all-hexas, but you could freely vary the cell type at will inside the second loop.

The supported cell types are:

In the figure above, the small numbers in the cell vertexes are the hardcoded constants in the second loop. The vIds[] array contains the vertex IDs (run-length indexes) of the XYZ vertexes loaded in the first loop.

I hope this helps,

Paulo

CsatiZoltan · November 30, 2022, 6:55pm

Yes, I considered this demo example before. However, the methods you use create new data in the memory (if you allocate memory, the operations will be faster, but it still allocates new memory). On the other hand, I want to use the existing memory from the simulation mesh. As far as I know, the SetData method is the way to instruct VTK to use an already existing memory block.

Paulo_Carvalho · December 1, 2022, 5:36pm

Is it possible to render visualization only after simulation, when you can discard the sim mesh?

CsatiZoltan · December 1, 2022, 7:08pm

No, because our goal is in-situ visualization due to the huge number of degrees of freedom. So the mesh and the solution comes from a large-scale simulation on an HPC cluster, and our aim is to visualize the solution (and some derived quantities) without writing the results into files.

Paulo_Carvalho · December 2, 2022, 2:04pm

Can you, please, elaborate more on that? I still don’t see the need for visualization while the simulation is running. I work in the petroleum industry and we also have large scale simulations like that (e.g. multi-phase flow simulation in porous media). The simulation runs for days and we only need to visualize the results after it completes.

CsatiZoltan · December 2, 2022, 3:39pm

Can you, please, elaborate more on that?

I do not know about the petroleum industry, but in aeronautics, sometimes simulations running on (hundreds of) thousands of CPUs generate gigantic datasets.
For very large meshes, writing the solution and the mesh to files, and later loading them for the purpose of visualization is not only very slow (I/O dominates the whole process), but is also wasteful. The idea is to connect the simulation with a co-processing framework, like ParaView Catalyst, to display the results and to obtain derived quantities (e.g. slices) on the fly.

Paulo_Carvalho · December 3, 2022, 1:55pm

Recall that I never mentioned saving files to disk. Let’s get back to the posed problem: reduce memory footprint when feeding mesh + sim data to VTK API for visualization. In this regard, I’d recommend, first, porting your visualization front-end to C++, to eliminate one software layer (the Python-VTK binding) while avoiding the large overhead introduced by Python. Unless you can implement it as NumPy arrays and Matlab-like vectorized operations, Python tends to be less-than-optimized in both memory and CPU usage (features like dynamic typing and garbage collection come at a cost).

CsatiZoltan · December 5, 2022, 7:13pm

Moving to C++ is an option I considered. However, currently, I do not have to use explicit loops because the NumPy data is directly passed to VTK (both for the creation of vtkPoints and for vtkCellArray). Since I do not implement custom algorithms where the Python layer would be slow, I do not see the urge to move to C++. What I fail to find in VTK (if it is possible at all) is whether it can construct a vtkCellArray from non-contiguous memory addresses. If this functionality exists, but is not exposed to Python, I am ready to turn to C++. However, if this feature does not exist in C++ either, there is no gain to port my code from Python to C++.

cory.quammen · December 6, 2022, 12:11am

There is no feature to create a vtkCellArray from discontiguous memory segments to achieve zero-copy definitions of cell arrays that I am aware of, unfortunately.

Paulo_Carvalho · December 6, 2022, 12:39pm

Python objects are also bloated. It’s not only a matter of avoiding loops or other so-called non-Pythonic code practices. Anyway, you can stick to Python for your visualization front end but your code needs to optimize visualization. For example, instead of requesting all mesh data, maybe you’re better off by requesting cross-sections, for example. You can, for example, use a proxy coarse mesh to facilitate user navigation and request detailed data on user demand. In petroleum industry, we need to visualize terabyte-large seismic surveys. Of course it is not possible to render them awhole, so we need to be smart. There are techniques out there for large data visualization or large data management (LDM).