The overlappingAMR data structure has been recently added to the VTKHDF specification, now is time for its cousin, the Hyper Tree Grid (HTG for short) to be specified as well.
HTG is a compact and memory-efficient tree-based AMR data structure, that has been around in VTK for more than a decade. Many HTG-specific algorithms have been created, using a cursor mechanism coupled with tree iteration to browse through it. HTGs can be stored in the VTK XML format using the .htg extension. The parallel .phtg implementation is currently in a non-functionning state.
This post suggests a specification for the up-and-coming VTKHDF format for HTG, which will allow efficient distributed writing and reading, temporal support, and composite structures, distributed in a single or in multiple files.
The general structure of the file would look like this
GROUP "VTKHDF"
ATTRIBUTE "Version" // Updated to (2,4)
ATTRIBUTE "Type" // "HyperTreeGrid" In this case
// HTG-specific properties
ATTRIBUTE "BranchFactor" // 2 or 3, cell division factor
ATTRIBUTE "Dimensions" // 3-Vector
ATTRIBUTE "InterfaceInterceptsName" // String referencing a cell data array
ATTRIBUTE "InterfaceNormalsName" // String referencing a cell data array
ATTRIBUTE "TransposedRootIndexing" // Bool, true if the indexing mode of the grid is inverted
// Grid point coordinates
DATASET "XCoordinates"
DATASET "YCoordinates"
DATASET "ZCoordinates"
// HTG Specific fields
DATASET "NumberOfTrees" // One entry for each distributed part
DATASET "DescriptorsSize" // One entry for each distributed part
DATASET "DepthPerTree" // size = sum(NumberOfTrees), maximum depth for each tree
DATASET "Descriptors" // Packed bit array, size = sum(DescriptorsSize)
DATASET "NumberOfCellsPerDepth" // size = sum(DepthPerTree), number of cells for each depth of each tree
DATASET "TreeIds" // size = sum(NumberOfTrees)
DATASET "Mask" // Packed bit array, size = sum(NumberOfCellsPerDepth)
GROUP "FieldData"
...
GROUP "CellData"
...
If you’re aware of the .htg v2 format, this is mostly a mapping of its content, with the addition of a few offset fields to support efficient distributed reading.
Misc. Notes:
“NumberOfTrees” has one value for each part of the distributed dataset, and is optional for non-distributed data or if the number of trees in each part is the same.
“DescriptorsSize” is required for distributed datasets and used as an offset array in the “Descriptors” bit set. Without it, each rank would need to process all of the previous trees to calculate the read offset for descriptors.
“TreeIds” must have all values 0->N, but in any order. This allows distributing trees across parts
“Mask” is optional, and omitted if no cell is masked.
“NumberOfCellsPerDepth” is renamed from “NumberOfVerticesPerDepth” in the XML format. It allows for faster reading when limiting reading depth. The reader can compute the number of cells to offset for a given tree using this value and “DepthPerTree”.
There is no “PointData” for the HTG structure, because we only consider cells.
For temporal, offsets for NumberOfTrees, Descriptors, NumberOfCellsPerDepth and NumberOfCells will be required in the “Steps” group, similarly to the other types of VTKHDF datasets. This way, the reader can easily pick up any time step without any pre-computation.
There are two datasets in the proposal, Descriptors and Mask that are described as packed bit arrays.
My experience with packed bit arrays (in my interpretation a datatype a bool 0/1 value with 1 bit storage space, rounded up to the nearest byte in length) is that different languages/frameworks work differently.
Would the following Python example give the desired encoding?
import h5py as h5
import numpy as np
bitlist = np.zeros(100, dtype=np.uint8)
with h5.File('bitfield.h5', 'w') as fh:
bitfield = np.packbits(bitlist)
fh.create_dataset('bitfield', data=bitfield)
100 bool elements in the “bitlist” (in which 0 = false, any other value = true) occupy 12.5 bytes and the dataset in the file therefore ends up being 13 elements of uint8 type.
Or is this not what you expect?
I know vtkBitArray is encoded such that each byte stores eight bool values, but not every programming language and framework has such a class/datatype natively. I think one of the great advantages of vtkhdf is that it is relatively easy to write custom writer functions/methods/classes/tools from applications in various languages (Python, C, C++, Fortran, …) without relying on the vtk library itself. The writing of these packed bit arrays needs to be possible without too much manual encoding work from all of these languages.
but I do not exactly know what features the library offer, and for instance what features are exposed and available from important packages like h5py.
I think your suggestion of manually packing the bitfield is sensible because you can achieve it easily in any language that allows bit-manipulation of integers.
Hello Maxime, the Dimensions attribute corresponds to the number of trees in each direction, so you’re right, it can have 1, 2 or 3 components whether we’re describing a 1D, 2D or 3D HTG. It is not necessarily a 3 components vector as I wrote initially.
Hello Louis, thank you for the answer. I have another question: NumberOfCellsPerDepth shouldn’t be named NumberOfCellsPerDepthPerTree ? Even if it seems too long, it is more accurate no ?
That’s right, NumberOfCellsPerDepth actually defines the number of Cells for each depth of each tree. I kept this name to mirror the name used in the XML HTG reader, but it can be changed to more accurately represent its content. NumberOfCellsPerTreeDepth sounds acceptable to you?
NumberOfCellsPerTreeDepth sounds better than what I proposed. Thank you for this HTG structure and answers. I look forward to use it when it will be available
Can this support something similar to ForceStaticMesh (ei ForceStaticTree) where a the cell attribute data can be changing in time with respect to a static tree such that visualization in VTK/paraview can optimize cycling through time steps without re-reading the underlying tree
Yes, static trees will also be supported, the same way static meshes work with the other data types supported by the VTKHDF specification; the fields in the Steps group specify a read offset for the other arrays for each time step, such as “Descriptors” in our case. If 2 time steps use the same read offset for an array, then the tree geometry can only be read once and reused. You can read more about time support in VTKHDF here: VTK File Formats - VTK documentation
Last April 2024 I asked about static geometry with time varying data here HTG time thread@Charles_Gueunet commented at the time “You can force the static mesh on a dataset, but nothing will really use this information.” Are downsteam filters better able to use static topologies in VTK 9.4+ compared to 10 months ago.