Overlapping AMR support in vtkHDF

Hello,

this is a proposal to add Overlapping AMR support to the vtkHDF format.

AMR stands for Adaptive Mesh Refinement. The definition is basically a collection of image data grouped by level, where each level is a refinement of the previous level.

Here is the proposed pseudo-hdf structure, inspired by the .vth file format:

GROUP "VTKHDF"
  ATTRIBUTE "Version" (maybe incremented to 2 because we add the "Type" attribute)
  ATTRIBUTE "Type" ("OverlappingAMR" in this case, but "ImageData" and "UnstructuredGrid" are also valid)
  ATTRIBUTE "Origin"
  ATTRIBUTE "GridDescription"

  GROUP "Level0" (one group per level)
      ATTRIBUTE "Spacing"
      GROUP "DataSet0" 
          ATTRIBUTE "Index"
          ATTRIBUTE "AMRBox"
          GROUP "ImageData" (this group definition complies with the vtkHDF ImageData definition)
      GROUP "DataSet1"
          ATTRIBUTE "Index"
          ATTRIBUTE "AMRBox"
          GROUP "ImageData"
      GROUP "DataSet2"
      ...
 
   GROUP "Level1"
      ATTRIBUTE "Spacing"
      GROUP "DataSet10"
         ...
      GROUP "DataSet11"
         ...
      GROUP "DataSet12"
         ...

The Type attribute of the root group is to easily identify what type of dataset is described in this HDF structure. The current implementation looks for a specific attribute to check between ImageData and UnstructuredGrid. This is OK at the moment but it may be problematic when adding more DataSet types. So I propose to set the DataSet type explicitly in the specification.

Any comment and suggestion welcome!

Thanks,
François

1 Like

You talk only about Overlapping AMR, but is it really different for NonOverlapping ? I think both can be supported with quite the same definition.

Yes probably, then the ATTRIBUTE "Type" would be used to differentiate the two structures.

I understand the attractiveness of keeping each level and dataset a separate group, and then just utilize the already created data format inside these groups.

However, I’m afraid that this will become problematic with disc access on high latency systems, such as traditional supercomputer systems, spinning disk arrays, network storage etc (or in other words: everything but SSD’s).

I work with overlapping AMR’s with up to 250 000 grids, each grid is typically 24x24x24 cells. This is data that are generated from simulations with up to 10 000 processes and possibly more in the future. In this proposal, inside each dataset group there would be groups for cell- and point data, and each variable would then be a very tiny dataset (24x24x24 cells/dataset x 4 bytes/cell = 55296 bytes/dataset) inside these groups.

If we have 10 cell and 10 point data arrays of 250 000 grids, there would be 5 million datasets inside the HDF5 file. Each dataset would be super-tiny, and the disk access would essentially be random-access. HDF5 features such as read caches are not possible to utilize, since they work on a per dataset basis. Additionally, because parallel access only work when all processes write to the same dataset, parallel writing of the file would not work.

I have a very strong gut feeling that this file format will choke in metadata IO and collective coordination effort.

I could think of a structure closer to that of an unstructured format, where one could keep the arrays of the different levels in the same datasets, and then store a list of indices into this 1-D data array? Something like:

GROUP "VTKHDF"
  ATTRIBUTE "Version" (maybe incremented to 2 because we add the "Type" attribute)
  ATTRIBUTE "Type" ("OverlappingAMR" in this case, but "ImageData" and "UnstructuredGrid" are also valid)
  ATTRIBUTE "Origin"
  ATTRIBUTE "GridDescription"

  GROUP "Level0" (one group per level)
    ATTRIBUTE "Spacing"
    DATASET "AMRBox" (N-by-6 dataset, 6 columns of AMRBox for N DataSet's)
    GROUP "PointData" (Similar to that of an vtkUnstructuredGrid)
      DATASET "Array1" (1-D layout of all ImageData)
      DATASET "Array2" (1-D layout of all ImageData)
      ...
    GROUP "Celldata"
      ...

  GROUP "Level1"
    ATTRIBUTE "Spacing"
    DATASET "AMRBox" (M-by-6 dataset, 6 columns of AMRBox for M DataSet's)
    GROUP "PointData" (Similar to that of an vtkUnstructuredGrid)
      DATASET "Array1" (1-D layout of all ImageData)
      DATASET "Array2" (1-D layout of all ImageData)
      ...
    GROUP "Celldata"
      ...

If partial arrays are to be supported one needs additional information on each dataset and array (offset-and-length in the 1-D storage for each array). Otherwise I believe the AMRBox gives enough information to compute the offset of the start of each array to read. The advantage is that the data format scales well, the number of datsets and groups are more or less constant and independent on the problem size. Reading the complete dataset can be done by reading a low number of arrays that are stored in a sequential order on disk. From a writing perspective it is also trivial to write in parallel, assuming each MPI rank has a portion of the datasets, then there is collective effort needed in coordinating only between each array.

(I get into a small terminology problem here - because HDF5 datasets and VTK datasets are not necessarily the same in this discussion. Please ask if something is not clear.)

Thanks for your feedback, it makes sense that lots of small attributes will be slower to read than a big attribute. The whole point of using hdf5 is performance, so your input is very valuable.

In your proposal, we should add extra information in the AMRBox Dataset which correspond to the underlying image data: WholeExtent, Origin, Spacing, Direction and Piece Extent if applicable. In hdf5 it should be attributes under a group.

Hence we may need a GROUP AMRBoxes with several GROUP AMRBox with all the required data as attributes. Unless you think it could also be a bottleneck?

@danlipsa any thought?

@hakostra Thanks for the feedback. Given your constrains it looks like your proposal is the way to go.

So, you’ll never need to partition a dataset for your data? You always read/write a whole dataset.
Would that the case for any AMR data? Partitioning a dataset would create additional complications for the reader and writer as it would have to worry about 1D to 3D conversion.

How do you store your data now? Can you share a small example we could use for testing?

Just to note that the current vtk XML AMR format stores each box in its own vti, similarly with Francois original format. So that would not work very well with your data as you’ll get a very large XML and many small vti files. See TestAMRXMLIO.

We have a CFD code “MGLET” that solve incompressible flow and acoustics on staggered, Cartesian grids using immersed boundary methods. The simulation grids are built up of small, Cartesian grids in an octree structure to refine details of the geometry and flow. Each grid is typically 24x24x24 cells - we found that this is a sweet spot between performance and the ability to do efficient local grid refinement.

Internally MGLET store the data with 2 ghost layers on each grid, so a 24^3 grid is stored as 28^3 and so on. This is written out in an HDF5 file in a special way. This file is complete in the sense that the flowsolver can completely restart from a saved state. It is not very useful for postprocessing directly.

We have a postprocessing tool that takes this restart file and fills an OverlappingAMR structure depending on the users needs. This tool also handle things such as the staggered velocities that are defined on the faces of the cells, and properly interpolate them onto cell centers or vertices such that VTK can visualize them. It also strips off any ghost cells and can compute certain derived quantities that require some deep knowledge about details that the VTK library cannot do (often related to staggering). Already in this stage users can choose to only process certain regions in space or only certain levels to reduce the amount of data that are produced.

The outcome of this postprocessing tool is currently a VTH dataset (vth + a folder with lots of small vti’s) the user can open in Paraview. The tool can additionally generate simple 2-D and 3-D images such as cutplanes, isosurfaces but that is not interesting for this discussion since it does not rely on any datafiles on disk (no intermediate datasets are written - only the resulting image).

Our problem is that due to the small grids used in the code, we end up with tens of thousands of vti’s as you comment in your last paragraph. These are a pain to have laying around. The biggest case I have computed has 250 000 grids - and therefore result in 250 000 vti-files. This is not desirable. On typical classic supercomputers we often face problems related to quotas on number of files, which can be around 1 million. A few datasets like this and we are out of quota… Therefore it would be very useful to have an alternative AMR format that overcome the limitations of the XML-based one when the number of blocks/grids become too large.

I will soon follow up with an example.

For the purpose of discussing the file structure I created a small 2-way converter in Python, capable of converting VTH->HDF5->VTH.

The file format proposal itself has some inherent limitations:

  1. For each array, all grids on the same level must use the same datatype. This is inherent in the file design since they share the same HDF5 dataset.
  2. Arrays cannot be partially defined on a subset of the grids within one level in this proposal. Arrays can be defined on some levels only, though (e.g. each level must be complete).

I believe pt. 1 is quite difficult to get around in my proposal. In the original proposal by @Francois_Mazen this limitation would not be present. Point 2. can be avoided given that some additional metadata are stored per grid to indicate which grids carry the array and which does not. It should not be difficult to implement if this is a common usage scenario.

In this suggestion, the file format and data layout is more or less purely given by the collection of AMRBox’es associated with the AMR. As long as you have the AMRBox collection, each individual grid is is properly defined from this, both in terms of grid spacing, origin and dimensions. Therefore, as soon as you know the AMRBox layout, you can compute all data structures and describe the complete file layout for both writing and reading an arbitrary part of the whole AMR, without actually having the data at hand. I do not know if there are situations where the origin, spacing, dimensions must be explicitly stored per grid basis and if they are allowed to differ from that you can compute from the AMRBox. Please correct me if I’m wrong on this topic…

The actual implementation attached to this post carry further limitations that are not from the file design but purely from my own laziness:

  1. Only the first grid on the first level is interrogated to find which data arrays to write out. All arrays must be present on all grids on all levels.
  2. Arrays with character datatype are silently skipped (vtk numpy_support does not handle character arrays very good).
  3. Arrays with zero elements are silently skipped.
  4. The implementation create a memory buffer for each data array, fills this buffer completely and flushes it to disk for each complete array and level for maximum performance. Other actual implementations might do this in another way, with the same file layout.
  5. I have no non-Overlapping AMR sets at hand, and do not user this format. Hence it is not tested. I do not know how this differ from Overlapping AMR and cannot say what it would take to make it work.

None of these should be difficult to implement/work around. I did this now to discuss the file format, as design choices will result in various penalties in form of lack of generality, loss of performance etc. later.

See small example datasets here (one 3-D and one 2-D) in their original VTH file format:
https://www.jottacloud.com/s/261a1a7c91992f5403bb859679f5c9038b5

amrhdf.py (18.5 KB)

I don’t think that Extent, Origin and Spacing are necessary. They can be derived from origin, spacing and AMRBox information proposed. Direction is really not appropriate for this type of AMR mesh.

Hakon’s design is great. This has been adopted by big AMR codes such as Chombo due to the performance issues mentioned by Hakon. Note that when reading in parallel, all ranks will need to have access to the AMRBox datasets to figure out offsets into the image data arrays. We should make sure that some sort of collective operation is done (either ourselves in the reader via MPI or using the HDF5 parallel capabilities) to avoid performance issues.