There are two datasets in the proposal, Descriptors
and Mask
that are described as packed bit arrays.
My experience with packed bit arrays (in my interpretation a datatype a bool 0/1 value with 1 bit storage space, rounded up to the nearest byte in length) is that different languages/frameworks work differently.
Would the following Python example give the desired encoding?
import h5py as h5
import numpy as np
bitlist = np.zeros(100, dtype=np.uint8)
with h5.File('bitfield.h5', 'w') as fh:
bitfield = np.packbits(bitlist)
fh.create_dataset('bitfield', data=bitfield)
100 bool elements in the “bitlist” (in which 0 = false, any other value = true) occupy 12.5 bytes and the dataset in the file therefore ends up being 13 elements of uint8
type.
Or is this not what you expect?
I know vtkBitArray
is encoded such that each byte stores eight bool values, but not every programming language and framework has such a class/datatype natively. I think one of the great advantages of vtkhdf
is that it is relatively easy to write custom writer functions/methods/classes/tools from applications in various languages (Python, C, C++, Fortran, …) without relying on the vtk library itself. The writing of these packed bit arrays needs to be possible without too much manual encoding work from all of these languages.