HyperTreeGrid support in VTKHDF

Louis_Gombert · January 14, 2025, 3:12pm

The overlappingAMR data structure has been recently added to the VTKHDF specification, now is time for its cousin, the Hyper Tree Grid (HTG for short) to be specified as well.

HTG is a compact and memory-efficient tree-based AMR data structure, that has been around in VTK for more than a decade. Many HTG-specific algorithms have been created, using a cursor mechanism coupled with tree iteration to browse through it. HTGs can be stored in the VTK XML format using the .htg extension. The parallel .phtg implementation is currently in a non-functionning state.

This post suggests a specification for the up-and-coming VTKHDF format for HTG, which will allow efficient distributed writing and reading, temporal support, and composite structures, distributed in a single or in multiple files.

The general structure of the file would look like this

GROUP "VTKHDF"
    ATTRIBUTE "Version"      // Updated to (2,4)
    ATTRIBUTE "Type"         // "HyperTreeGrid" In this case

    // HTG-specific properties
    ATTRIBUTE "BranchFactor"            // 2 or 3, cell division factor
    ATTRIBUTE "Dimensions"              // 3-Vector
    ATTRIBUTE "InterfaceInterceptsName" // String referencing a cell data array
    ATTRIBUTE "InterfaceNormalsName"    // String referencing a cell data array
    ATTRIBUTE "TransposedRootIndexing"  // Bool, true if the indexing mode of the grid is inverted

    // Grid point coordinates
    DATASET "XCoordinates"
    DATASET "YCoordinates"
    DATASET "ZCoordinates"

    // HTG Specific fields
    DATASET "NumberOfTrees"             // One entry for each distributed part
    DATASET "DescriptorsSize"           // One entry for each distributed part
    DATASET "DepthPerTree"              // size = sum(NumberOfTrees), maximum depth for each tree
    DATASET "Descriptors"               // Packed bit array, size = sum(DescriptorsSize)
    DATASET "NumberOfCellsPerDepth"     // size = sum(DepthPerTree), number of cells for each depth of each tree
    DATASET "TreeIds"                   // size = sum(NumberOfTrees)
    DATASET "Mask"                      // Packed bit array, size = sum(NumberOfCellsPerDepth)

    GROUP "FieldData"
        ...
    GROUP "CellData"
        ...

If you’re aware of the .htg v2 format, this is mostly a mapping of its content, with the addition of a few offset fields to support efficient distributed reading.

Misc. Notes:

“NumberOfTrees” has one value for each part of the distributed dataset, and is optional for non-distributed data or if the number of trees in each part is the same.
“DescriptorsSize” is required for distributed datasets and used as an offset array in the “Descriptors” bit set. Without it, each rank would need to process all of the previous trees to calculate the read offset for descriptors.
“TreeIds” must have all values 0->N, but in any order. This allows distributing trees across parts
“Mask” is optional, and omitted if no cell is masked.
“NumberOfCellsPerDepth” is renamed from “NumberOfVerticesPerDepth” in the XML format. It allows for faster reading when limiting reading depth. The reader can compute the number of cells to offset for a given tree using this value and “DepthPerTree”.
There is no “PointData” for the HTG structure, because we only consider cells.
For temporal, offsets for NumberOfTrees, Descriptors, NumberOfCellsPerDepth and NumberOfCells will be required in the “Steps” group, similarly to the other types of VTKHDF datasets. This way, the reader can easily pick up any time step without any pre-computation.

Any comment or suggestion is welcome!

cc @mwestphal @Charles_Gueunet @hakostra @lgivord

mwestphal · January 14, 2025, 3:18pm

Nice writeup.

Could you clarify if distributed data is supported in this VTKHDF specification ?

Louis_Gombert · January 14, 2025, 3:21pm

It totally is. The sentence you quoted is about the current XML PHTG data format, which I could not get to work properly.

hakostra · January 15, 2025, 9:14am

There are two datasets in the proposal, Descriptors and Mask that are described as packed bit arrays.

My experience with packed bit arrays (in my interpretation a datatype a bool 0/1 value with 1 bit storage space, rounded up to the nearest byte in length) is that different languages/frameworks work differently.

Would the following Python example give the desired encoding?

import h5py as h5
import numpy as np

bitlist = np.zeros(100, dtype=np.uint8)

with h5.File('bitfield.h5', 'w') as fh:
    bitfield = np.packbits(bitlist)
    fh.create_dataset('bitfield', data=bitfield)

100 bool elements in the “bitlist” (in which 0 = false, any other value = true) occupy 12.5 bytes and the dataset in the file therefore ends up being 13 elements of uint8 type.

Or is this not what you expect?

I know vtkBitArray is encoded such that each byte stores eight bool values, but not every programming language and framework has such a class/datatype natively. I think one of the great advantages of vtkhdf is that it is relatively easy to write custom writer functions/methods/classes/tools from applications in various languages (Python, C, C++, Fortran, …) without relying on the vtk library itself. The writing of these packed bit arrays needs to be possible without too much manual encoding work from all of these languages.

Louis_Gombert · January 15, 2025, 9:31am

Hey Håkon, thanks for your feedback. I did not share it but it is exactly what I’m doing in my test writer:

    descriptors = (
          # Tree 0
          1,  # Depth 1
            0, 0, 1, 0, 0, 0, 0, 0, # Depth 2
          # Tree 1
          1, # Depth 1
            0, 1, 0, 0, 0, 0, 0, 0, # Depth 2
          # Tree 3 : refined
          1,
          # Tree 4 : not refined
    )
    packed_descriptors = np.packbits(descriptors)
    root.create_dataset("Descriptors", data=packed_descriptors)

After a quick survey of bit packing techniques, using uint8 seem like a safe bet.

hakostra · January 16, 2025, 11:49am

Seems like a reasonable approach. I asked because I the HDF5 library has some mention on bitfields in the manual:

https://support.hdfgroup.org/documentation/hdf5/latest/_h5_t__u_g.html#subsubsec_datatype_other_bitfield

but I do not exactly know what features the library offer, and for instance what features are exposed and available from important packages like h5py.

I think your suggestion of manually packing the bitfield is sensible because you can achieve it easily in any language that allows bit-manipulation of integers.

mstauffert · February 5, 2025, 3:26pm

Thank you for this topic.

I have a question on dimension attribute, shouldn’t it be 1, 2 or 3 ?

ATTRIBUTE "Dimensions"              // 3-Vector

We really rarely use 1 dimension but my point was just because I don’t get what 3-Vector means.

Louis_Gombert · February 5, 2025, 4:23pm

Hello Maxime, the Dimensions attribute corresponds to the number of trees in each direction, so you’re right, it can have 1, 2 or 3 components whether we’re describing a 1D, 2D or 3D HTG. It is not necessarily a 3 components vector as I wrote initially.

mstauffert · February 12, 2025, 8:50am

Hello Louis, thank you for the answer. I have another question: NumberOfCellsPerDepth shouldn’t be named NumberOfCellsPerDepthPerTree ? Even if it seems too long, it is more accurate no ?

Louis_Gombert · February 12, 2025, 9:51am

That’s right, NumberOfCellsPerDepth actually defines the number of Cells for each depth of each tree. I kept this name to mirror the name used in the XML HTG reader, but it can be changed to more accurately represent its content. NumberOfCellsPerTreeDepth sounds acceptable to you?

mstauffert · February 12, 2025, 10:49am

NumberOfCellsPerTreeDepth sounds better than what I proposed. Thank you for this HTG structure and answers. I look forward to use it when it will be available

omdaniel · February 13, 2025, 1:23pm

Can this support something similar to ForceStaticMesh (ei ForceStaticTree) where a the cell attribute data can be changing in time with respect to a static tree such that visualization in VTK/paraview can optimize cycling through time steps without re-reading the underlying tree

Louis_Gombert · February 13, 2025, 1:52pm

Yes, static trees will also be supported, the same way static meshes work with the other data types supported by the VTKHDF specification; the fields in the Steps group specify a read offset for the other arrays for each time step, such as “Descriptors” in our case. If 2 time steps use the same read offset for an array, then the tree geometry can only be read once and reused. You can read more about time support in VTKHDF here: VTK File Formats - VTK documentation

omdaniel · February 14, 2025, 7:18pm

Last April 2024 I asked about static geometry with time varying data here HTG time thread @Charles_Gueunet commented at the time “You can force the static mesh on a dataset, but nothing will really use this information.” Are downsteam filters better able to use static topologies in VTK 9.4+ compared to 10 months ago.

mwestphal · February 17, 2025, 8:57am

Hi @omdaniel

VTKHDF natively support static mesh in the sense that same geometry will not be read when changing timesteps, which is already a good speedup source.

No HTG filter uses this information to optimize their processing, yet. This could be added in the future though.

omdaniel · April 23, 2025, 2:58pm

@Louis_Gombert great work on the HTG integration into vtkHDF. I saw the merge request and wanted to know if you think it may be accepted before the feature freeze for vtk 9.5?

Louis_Gombert · April 23, 2025, 3:01pm

Yes that’s my goal, it should be in VTK 9.5 and ParaView 6.0. Thanks for checking it out!

colive · June 4, 2025, 3:17pm

Is it possible to get some guidance about how the distributed data will work w.r.t. duplicated trees between difference pieces? For example could one tree (e.g. ID 0) in one piece have descriptors like 1 0 0 0 1 where another piece has descriptors for the same tree (also ID 0) like 1 1 0 0 0 and if so how VTK interprets this? Or is is strictly necessary that the pieces are disjoint sets of trees?

Louis_Gombert · June 5, 2025, 7:30am

Tree Ids do not need to be globally unique across pieces. Descriptors are read separately for each piece.

See for example this file from the testing set:
multipiece_htg.hdf (124.6 KB)

Each piece has 15 trees (NumberOfTrees field), and a descriptor of size 1. In this case, the descriptors are the same for each piece, because only 1 tree is refined to level 2 for each piece, but they could totally be different.

colive · June 5, 2025, 11:41am

I wrote a small test file/script to see how paraview (6.0 RC-1) interprets different descriptors. It seems that if you give conflicting descriptors for the same tree it only considers the last piece rather than some sort of union of the two. Maybe this is intentional but unfortunately my tree-AMR program does domain decomposition within a single tree and so I’m not sure I will be able to make use it of it if that is the case.

Since I can’t upload the file as a new user I’ll put the python code I generated it with here:

EDIT: The file is attached below. The two descriptor fields are “1 1100” and “1 0110” but if you load the file you may see that only a tree with the second piece’s descriptors is visible.

import h5py
import numpy as np

branch_factor              = np.array(2, dtype='i8')           # H5T_STD_I64LE scalar
dimensions                 = np.array([2, 2, 1], dtype='i8')   # H5T_STD_I64LE shape=(3,)
transposed_root_indexing   = np.array(0, dtype='i8')           # H5T_STD_I64LE scalar

type_dtype = h5py.string_dtype(encoding='ascii', length=13)
type_value = np.array("HyperTreeGrid", dtype=type_dtype)
version = np.array([2, 4], dtype='i8')                         # H5T_STD_I64LE shape=(2,)

def pack_binary_string(bitstr: str) -> np.ndarray:
    bits = "".join(bitstr.split())
    nbits = len(bits)
    nbytes = (nbits + 7) // 8 
    arr = np.zeros(nbytes, dtype='u1')
    for i in range(nbytes):
        chunk = bits[i*8 : i*8 + 8]
        if len(chunk) < 8:
            chunk = chunk.ljust(8, "0")
        arr[i] = int(chunk, 2)

    return arr


descriptors = pack_binary_string("1 1100 000 1 0110 000")
descriptors_size = np.array([8,8], dtype='i8')
number_of_cells = np.array([9,9], dtype='i8')
depth_per_tree = np.array([3,3], dtype='i8')
number_of_cells_per_tree_depth = np.array([1,4,4,1,4,4], dtype='i8')
number_of_depths = np.array([3,3], dtype='i8')
number_of_trees = np.array([1,1], dtype='i8')
tree_ids = np.array([0,0],dtype='i8')
x_coordinates = np.array([0,1], dtype='f8')
y_coordinates = np.array([0,1], dtype='f8')
z_coordinates = np.array([0], dtype='f8')

# -----------------------------------------------------------------------------
# 2) Create the HDF5 file and reproduce the exact group/dataset structure
# -----------------------------------------------------------------------------
import os 

script_path = os.path.abspath(__file__)
script_dir = os.path.dirname(script_path)
htg_path = os.path.join(script_dir, "multipiece_htg.hdf")

with h5py.File(htg_path, "w") as f:
    # 2.1 Create the top‐level VTKHDF group
    vtkgrp = f.create_group("VTKHDF")

    # 2.2 Write the five attributes onto "/VTKHDF"
    vtkgrp.attrs.create("BranchFactor",            branch_factor,            dtype='i8')
    vtkgrp.attrs.create("Dimensions",              dimensions,               dtype='i8')
    vtkgrp.attrs.create("TransposedRootIndexing",  transposed_root_indexing, dtype='i8')
    vtkgrp.attrs.create("Type",                    type_value,               dtype=type_dtype)
    vtkgrp.attrs.create("Version",                 version,                  dtype='i8')

    # 2.3 Create "/VTKHDF/CellData" and its "Depth" dataset
    celldata_grp = vtkgrp.create_group("CellData")

    # 2.4 Create all other datasets directly under "/VTKHDF"
    vtkgrp.create_dataset(
        "DepthPerTree",
        data=depth_per_tree,
        dtype='i8',
        shape=depth_per_tree.shape,
        maxshape=(None,)
    )

    vtkgrp.create_dataset(
        "Descriptors",
        data=descriptors,
        dtype='u1',
        shape=descriptors.shape,
        maxshape=(None,)
    )

    vtkgrp.create_dataset(
        "DescriptorsSize",
        data=descriptors_size,
        dtype='i8',
        shape=descriptors_size.shape,
        maxshape=(None,)
    )

    vtkgrp.create_dataset(
        "NumberOfCells",
        data=number_of_cells,
        dtype='i8',
        shape=number_of_cells.shape,
        maxshape=(None,)
    )

    vtkgrp.create_dataset(
        "NumberOfCellsPerTreeDepth",
        data=number_of_cells_per_tree_depth,
        dtype='i8',
        shape=number_of_cells_per_tree_depth.shape,
        maxshape=(None,)
    )

    vtkgrp.create_dataset(
        "NumberOfDepths",
        data=number_of_depths,
        dtype='i8',
        shape=number_of_depths.shape,
        maxshape=(None,)
    )

    vtkgrp.create_dataset(
        "NumberOfTrees",
        data=number_of_trees,
        dtype='i8',
        shape=number_of_trees.shape,
        maxshape=(None,)
    )

    vtkgrp.create_dataset(
        "TreeIds",
        data=tree_ids,
        dtype='i8',
        shape=tree_ids.shape,
        maxshape=(None,)
    )

    vtkgrp.create_dataset(
        "XCoordinates",
        data=x_coordinates,
        dtype='f8',
        shape=x_coordinates.shape,
        maxshape=(None,)
    )

    vtkgrp.create_dataset(
        "YCoordinates",
        data=y_coordinates,
        dtype='f8',
        shape=y_coordinates.shape,
        maxshape=(None,)
    )

    vtkgrp.create_dataset(
        "ZCoordinates",
        data=z_coordinates,
        dtype='f8',
        shape=z_coordinates.shape,
        maxshape=(None,)
    )

multipiece_htg.hdf (31.5 KB)