HyperTreeGrid support in VTKHDF

hakostra · January 15, 2025, 9:14am

There are two datasets in the proposal, Descriptors and Mask that are described as packed bit arrays.

My experience with packed bit arrays (in my interpretation a datatype a bool 0/1 value with 1 bit storage space, rounded up to the nearest byte in length) is that different languages/frameworks work differently.

Would the following Python example give the desired encoding?

import h5py as h5
import numpy as np

bitlist = np.zeros(100, dtype=np.uint8)

with h5.File('bitfield.h5', 'w') as fh:
    bitfield = np.packbits(bitlist)
    fh.create_dataset('bitfield', data=bitfield)

100 bool elements in the “bitlist” (in which 0 = false, any other value = true) occupy 12.5 bytes and the dataset in the file therefore ends up being 13 elements of uint8 type.

Or is this not what you expect?

I know vtkBitArray is encoded such that each byte stores eight bool values, but not every programming language and framework has such a class/datatype natively. I think one of the great advantages of vtkhdf is that it is relatively easy to write custom writer functions/methods/classes/tools from applications in various languages (Python, C, C++, Fortran, …) without relying on the vtk library itself. The writing of these packed bit arrays needs to be possible without too much manual encoding work from all of these languages.