HyperTreeGrid support in VTKHDF

The overlappingAMR data structure has been recently added to the VTKHDF specification, now is time for its cousin, the Hyper Tree Grid (HTG for short) to be specified as well.

HTG is a compact and memory-efficient tree-based AMR data structure, that has been around in VTK for more than a decade. Many HTG-specific algorithms have been created, using a cursor mechanism coupled with tree iteration to browse through it. HTGs can be stored in the VTK XML format using the .htg extension. The parallel .phtg implementation is currently in a non-functionning state.

This post suggests a specification for the up-and-coming VTKHDF format for HTG, which will allow efficient distributed writing and reading, temporal support, and composite structures, distributed in a single or in multiple files.

The general structure of the file would look like this

GROUP "VTKHDF"
    ATTRIBUTE "Version"      // Updated to (2,4)
    ATTRIBUTE "Type"         // "HyperTreeGrid" In this case

    // HTG-specific properties
    ATTRIBUTE "BranchFactor"            // 2 or 3, cell division factor
    ATTRIBUTE "Dimensions"              // 3-Vector
    ATTRIBUTE "InterfaceInterceptsName" // String referencing a cell data array
    ATTRIBUTE "InterfaceNormalsName"    // String referencing a cell data array
    ATTRIBUTE "TransposedRootIndexing"  // Bool, true if the indexing mode of the grid is inverted

    // Grid point coordinates
    DATASET "XCoordinates"
    DATASET "YCoordinates"
    DATASET "ZCoordinates"

    // HTG Specific fields
    DATASET "NumberOfTrees"             // One entry for each distributed part
    DATASET "DescriptorsSize"           // One entry for each distributed part
    DATASET "DepthPerTree"              // size = sum(NumberOfTrees), maximum depth for each tree
    DATASET "Descriptors"               // Packed bit array, size = sum(DescriptorsSize)
    DATASET "NumberOfCellsPerDepth"     // size = sum(DepthPerTree), number of cells for each depth of each tree
    DATASET "TreeIds"                   // size = sum(NumberOfTrees)
    DATASET "Mask"                      // Packed bit array, size = sum(NumberOfCellsPerDepth)

    GROUP "FieldData"
        ...
    GROUP "CellData"
        ...

If you’re aware of the .htg v2 format, this is mostly a mapping of its content, with the addition of a few offset fields to support efficient distributed reading.

Misc. Notes:

  • “NumberOfTrees” has one value for each part of the distributed dataset, and is optional for non-distributed data or if the number of trees in each part is the same.
  • “DescriptorsSize” is required for distributed datasets and used as an offset array in the “Descriptors” bit set. Without it, each rank would need to process all of the previous trees to calculate the read offset for descriptors.
  • “TreeIds” must have all values 0->N, but in any order. This allows distributing trees across parts
  • “Mask” is optional, and omitted if no cell is masked.
  • “NumberOfCellsPerDepth” is renamed from “NumberOfVerticesPerDepth” in the XML format. It allows for faster reading when limiting reading depth. The reader can compute the number of cells to offset for a given tree using this value and “DepthPerTree”.
  • There is no “PointData” for the HTG structure, because we only consider cells.
  • For temporal, offsets for NumberOfTrees, Descriptors, NumberOfCellsPerDepth and NumberOfCells will be required in the “Steps” group, similarly to the other types of VTKHDF datasets. This way, the reader can easily pick up any time step without any pre-computation.

Any comment or suggestion is welcome!

cc @mwestphal @Charles_Gueunet @hakostra @lgivord

1 Like

Nice writeup.

Could you clarify if distributed data is supported in this VTKHDF specification ?

It totally is. The sentence you quoted is about the current XML PHTG data format, which I could not get to work properly.

1 Like

There are two datasets in the proposal, Descriptors and Mask that are described as packed bit arrays.

My experience with packed bit arrays (in my interpretation a datatype a bool 0/1 value with 1 bit storage space, rounded up to the nearest byte in length) is that different languages/frameworks work differently.

Would the following Python example give the desired encoding?

import h5py as h5
import numpy as np

bitlist = np.zeros(100, dtype=np.uint8)

with h5.File('bitfield.h5', 'w') as fh:
    bitfield = np.packbits(bitlist)
    fh.create_dataset('bitfield', data=bitfield)

100 bool elements in the “bitlist” (in which 0 = false, any other value = true) occupy 12.5 bytes and the dataset in the file therefore ends up being 13 elements of uint8 type.

Or is this not what you expect?

I know vtkBitArray is encoded such that each byte stores eight bool values, but not every programming language and framework has such a class/datatype natively. I think one of the great advantages of vtkhdf is that it is relatively easy to write custom writer functions/methods/classes/tools from applications in various languages (Python, C, C++, Fortran, …) without relying on the vtk library itself. The writing of these packed bit arrays needs to be possible without too much manual encoding work from all of these languages.

Hey Håkon, thanks for your feedback. I did not share it but it is exactly what I’m doing in my test writer:

    descriptors = (
          # Tree 0
          1,  # Depth 1
            0, 0, 1, 0, 0, 0, 0, 0, # Depth 2
          # Tree 1
          1, # Depth 1
            0, 1, 0, 0, 0, 0, 0, 0, # Depth 2
          # Tree 3 : refined
          1,
          # Tree 4 : not refined
    )
    packed_descriptors = np.packbits(descriptors)
    root.create_dataset("Descriptors", data=packed_descriptors)

After a quick survey of bit packing techniques, using uint8 seem like a safe bet.

Seems like a reasonable approach. I asked because I the HDF5 library has some mention on bitfields in the manual:

https://support.hdfgroup.org/documentation/hdf5/latest/_h5_t__u_g.html#subsubsec_datatype_other_bitfield

but I do not exactly know what features the library offer, and for instance what features are exposed and available from important packages like h5py.

I think your suggestion of manually packing the bitfield is sensible because you can achieve it easily in any language that allows bit-manipulation of integers.

1 Like

Thank you for this topic.

I have a question on dimension attribute, shouldn’t it be 1, 2 or 3 ?

ATTRIBUTE "Dimensions"              // 3-Vector

We really rarely use 1 dimension but my point was just because I don’t get what 3-Vector means.

Hello Maxime, the Dimensions attribute corresponds to the number of trees in each direction, so you’re right, it can have 1, 2 or 3 components whether we’re describing a 1D, 2D or 3D HTG. It is not necessarily a 3 components vector as I wrote initially.

1 Like

Hello Louis, thank you for the answer. I have another question: NumberOfCellsPerDepth shouldn’t be named NumberOfCellsPerDepthPerTree ? Even if it seems too long, it is more accurate no ?

That’s right, NumberOfCellsPerDepth actually defines the number of Cells for each depth of each tree. I kept this name to mirror the name used in the XML HTG reader, but it can be changed to more accurately represent its content. NumberOfCellsPerTreeDepth sounds acceptable to you?

1 Like

NumberOfCellsPerTreeDepth sounds better than what I proposed. Thank you for this HTG structure and answers. I look forward to use it when it will be available

2 Likes

Can this support something similar to ForceStaticMesh (ei ForceStaticTree) where a the cell attribute data can be changing in time with respect to a static tree such that visualization in VTK/paraview can optimize cycling through time steps without re-reading the underlying tree

Yes, static trees will also be supported, the same way static meshes work with the other data types supported by the VTKHDF specification; the fields in the Steps group specify a read offset for the other arrays for each time step, such as “Descriptors” in our case. If 2 time steps use the same read offset for an array, then the tree geometry can only be read once and reused. You can read more about time support in VTKHDF here: VTK File Formats - VTK documentation

2 Likes

Last April 2024 I asked about static geometry with time varying data here HTG time thread @Charles_Gueunet commented at the time “You can force the static mesh on a dataset, but nothing will really use this information.” Are downsteam filters better able to use static topologies in VTK 9.4+ compared to 10 months ago.

Hi @omdaniel

VTKHDF natively support static mesh in the sense that same geometry will not be read when changing timesteps, which is already a good speedup source.

No HTG filter uses this information to optimize their processing, yet. This could be added in the future though.

2 Likes