I understand the attractiveness of keeping each level and dataset a separate group, and then just utilize the already created data format inside these groups.
However, I’m afraid that this will become problematic with disc access on high latency systems, such as traditional supercomputer systems, spinning disk arrays, network storage etc (or in other words: everything but SSD’s).
I work with overlapping AMR’s with up to 250 000 grids, each grid is typically 24x24x24 cells. This is data that are generated from simulations with up to 10 000 processes and possibly more in the future. In this proposal, inside each dataset group there would be groups for cell- and point data, and each variable would then be a very tiny dataset (24x24x24 cells/dataset x 4 bytes/cell = 55296 bytes/dataset) inside these groups.
If we have 10 cell and 10 point data arrays of 250 000 grids, there would be 5 million datasets inside the HDF5 file. Each dataset would be super-tiny, and the disk access would essentially be random-access. HDF5 features such as read caches are not possible to utilize, since they work on a per dataset basis. Additionally, because parallel access only work when all processes write to the same dataset, parallel writing of the file would not work.
I have a very strong gut feeling that this file format will choke in metadata IO and collective coordination effort.
I could think of a structure closer to that of an unstructured format, where one could keep the arrays of the different levels in the same datasets, and then store a list of indices into this 1-D data array? Something like:
GROUP "VTKHDF"
ATTRIBUTE "Version" (maybe incremented to 2 because we add the "Type" attribute)
ATTRIBUTE "Type" ("OverlappingAMR" in this case, but "ImageData" and "UnstructuredGrid" are also valid)
ATTRIBUTE "Origin"
ATTRIBUTE "GridDescription"
GROUP "Level0" (one group per level)
ATTRIBUTE "Spacing"
DATASET "AMRBox" (N-by-6 dataset, 6 columns of AMRBox for N DataSet's)
GROUP "PointData" (Similar to that of an vtkUnstructuredGrid)
DATASET "Array1" (1-D layout of all ImageData)
DATASET "Array2" (1-D layout of all ImageData)
...
GROUP "Celldata"
...
GROUP "Level1"
ATTRIBUTE "Spacing"
DATASET "AMRBox" (M-by-6 dataset, 6 columns of AMRBox for M DataSet's)
GROUP "PointData" (Similar to that of an vtkUnstructuredGrid)
DATASET "Array1" (1-D layout of all ImageData)
DATASET "Array2" (1-D layout of all ImageData)
...
GROUP "Celldata"
...
If partial arrays are to be supported one needs additional information on each dataset and array (offset-and-length in the 1-D storage for each array). Otherwise I believe the AMRBox gives enough information to compute the offset of the start of each array to read. The advantage is that the data format scales well, the number of datsets and groups are more or less constant and independent on the problem size. Reading the complete dataset can be done by reading a low number of arrays that are stored in a sequential order on disk. From a writing perspective it is also trivial to write in parallel, assuming each MPI rank has a portion of the datasets, then there is collective effort needed in coordinating only between each array.
(I get into a small terminology problem here - because HDF5 datasets and VTK datasets are not necessarily the same in this discussion. Please ask if something is not clear.)