VTKHDF point ownership

In the documentation of VTKHDF (VTK File Formats - VTK documentation) it states that

We describe the split into partitions using HDF5 datasets NumberOfConnectivityIds, NumberOfPoints and NumberOfCells. Let n be the number of partitions which usually correspond to the number of the MPI ranks. NumberOfConnectivityIds has size n where NumberOfConnectivityIds[i] represents the size of the Connectivity array for partition i. NumberOfPoints and NumberOfCells are arrays of size n, where NumberOfPoints[i] and NumberOfCells[i] are the number of points and number of cells for partition i. The Points array contains the points of the VTK dataset. Offsets is an array of size ∑ (S(i) + 1), where S(i) is the number of cells in partition i, indicating the index in the Connectivity array where each cell’s points start. Connectivity stores the lists of point ids for each cell, and Types contain the cell information stored as described in vtkCellArray documentation. Data for each partition is appended in a HDF dataset for Points, Connectivity, Offsets, Types, PointData and CellData. We can compute the size of partition i using the following formulas:

However, from this, it is not clear to me if a point is present on two processes, it has to be repeated multiple times in the Points array.

If it is repeated multiple times in the Points array, the mesh is effectively split into pieces, based on the partitioning interface.

If it is not duplicated, then the Connectivity-array should use global node indices.

@lgivord @Louis_Gombert

Hello @jorgensd,

In fact both case can happen. Points can be duplicate or not (to save memory space).

It will be the responsibility to the Offset (useful to know where we will start to read the dataset) and the NumberOfPoints (useful to know how many points we need to read) to read the desired part in the Points dataset (same thing for the connectivity).

So I believe it should be correct but maybe I miss something? :slightly_smiling_face:

If the points are duplicate over multiple processes, one has to adjust the global indices on all other processes, which makes it non-trivial (i.e. requires a global exscan) to write data in parallel, and would make the data more challenging to read in again at a later stage, as one would have to eliminate duplicate points.

Thus I would go for the non-duplicate version, i.e. I have M global nodes, which in the connectivity array is filled with the global indices. However, then it is unclear to me how to specify that a node is shared between multiple (let say two) processes, as the only information we provide in the VTKHDF file is

We describe the split into partitions using HDF5 datasets NumberOfConnectivityIds, NumberOfPoints and NumberOfCells.

However, this doesn’t indicate the start and end of NumberOfPoints.
Following is a minimal example of how a mesh is distributed in my code:

We have a 1x1 unit square consisting of two triangles, and in total four nodes.
Each processes owns a single triangle, lets say
Proc 0 owns the cell with vertices [0,0], [1,0], [0,1]
Proc 1 owns the cell with vertices [1,0], [1,1],[0,1]

If we write the unique nodes to file, say in the following order
[[0,0],[1,0],[0,1],[1,1]]
we need to indicate that process 0 has to access the first three points, range [0,3) while process 1 wants to access the nodes in range [1, 4).

I am not sure how to specify this with VTKHDF.

To make the issue even clearer, one could add another process with a single cell
[-1,0], [0,0], [0, 1], where we append [-1,0] at the end of the points array, meaning that process (2) would need point indices 0, 2 and 4 available (as that would be the connectivity array).

Would NumberOfPoints be three for each process?

To have a complete meshes on each node, you’ll have to duplicate points. In your example, the two triangles share two points and those points are duplicated on the two nodes. This is also what you have to write in the VTKHDF file.