VTKHDF Unstructured Grid - Grid in Separate File

Hello all. I am using the VTKHDF unstructured grid format to output data from a CFD model in Fortran. We currently want to output separate files per timestep. The issue we are running into at the moment is that duplicating the grid data in each file is prohibitive (~400 gb required for the simulation output compared to ~3gb if the data grid data is not repeated).

Unfortunately, the HDF5 native feature for linking to external files will not work for this application as we are using the MPIO collective write with variable hyperslab sizes (this requires chunking the data which is incompatible with the external file).

Would it be possible to add an attribute within the VTKHDF header to point to an external file containing the grid information? In the attached example I have output at 11 timesteps (beam_detector_SM3D*) that share the common mesh (beam_detector_MESH.vtkhdf). The SM3D files contain an attribute in the /VTKHDF header called “ExternalFile” which contains the filename of the grid.

vtkresults.zip (2.0 MB)

Hi @johodges

VTKHDF implementation and format have been evolving quite a lot recently, especially in the contexte of writing temporal file with static meshes.

Which version of VTK are you using ? Is using master a possibility for you ?

FYI @lgivord

@mwestphal I am currently writing the VTKHDF files with my own implementation using the HDF5 library in Fortran. To my knowledge, there is not an existing VTK Fortran module that supports MPIO collective write. However, I can use the master VTK library for importing the files in Paraview.

Regarding the feature request for this thread, I can change to the temporal format if that will be the main supported format for parallel read in the future. Is that a better solution than using the separate file per time-step and adding the ability to import the grid from a separate file?

Hello @johodges,

The VTKHDF File Format was initially designed to store all data and time-dependant data in a single file, it supports fixed chunking and soft link.

So yes in you case it would be difficult, with the current spec, to write each step in a separate file and avoiding duplication.

Adding a new attribute to store that could be a solution, but we need to check previously if there is not a better way to handle that as it will add some complexity for those who want to handle the full spec.

Regarding the feature request for this thread, I can change to the temporal format if that will be the main supported format for parallel read in the future. Is that a better solution than using the separate file per time-step and adding the ability to import the grid from a separate file?

Definitively yes and I believe it should work as is if you rely on vtk master for you post process.

Thanks for the information. I will try the temporal data out later this week. FYI, I do not know if this will be a general solution for our application as I expect these files will get very large (100s of gb) with the HPC simulations we are running. I will provide an update once I have implemented the writer for the temporal data (probably next week some time).

1 Like

I believe what you want to achieve is possible through standard HDF5 library tools and concepts. I am a bit uncertain on exactly what you want to achieve, but I’ll try to give you some examples to start with.

I first assume you want to have a single file for a specific timestep, that can be read by VTK and/or Paraview, and you want no usage of the temporal data features in vtkhdf. The data itself is in a file for a specific timestep (beam_detector_SM3D_0000000000.vtkhdf), but the mesh is in an external file and is shared between different files/timesteps.

This is quite easy to achieve. All you need is a few external links in your HDF5 files. If you extract the files in your example in a common folder, and run the following Python script:

import h5py as h5

meshfile = "beam_detector_MESH.vtkhdf"

#for file in glob.glob('beam_detector_SM3D_*.vtkhdf'):
for itstep in [0, 16, 24, 32, 48, 56, 64, 72, 88, 96, 100]:
    file = f'beam_detector_SM3D_{itstep:010d}.vtkhdf'
    with h5.File(file, 'r+') as f:
        vtkhdf = f['VTKHDF']

        del vtkhdf['NumberOfCells']
        del vtkhdf['NumberOfConnectivityIds']
        del vtkhdf['NumberOfPoints']

        vtkhdf['Connectivity'] = h5.ExternalLink(meshfile, '/VTKHDF/Connectivity')
        vtkhdf['NumberOfCells'] = h5.ExternalLink(meshfile, '/VTKHDF/NumberOfCells')
        vtkhdf['NumberOfConnectivityIds'] = h5.ExternalLink(meshfile, '/VTKHDF/NumberOfConnectivityIds')
        vtkhdf['NumberOfPoints'] = h5.ExternalLink(meshfile, '/VTKHDF/NumberOfPoints')
        vtkhdf['Offsets'] = h5.ExternalLink(meshfile, '/VTKHDF/Offsets')
        vtkhdf['Points'] = h5.ExternalLink(meshfile, '/VTKHDF/Points')
        vtkhdf['Types'] = h5.ExternalLink(meshfile, '/VTKHDF/Types')

your files with the actual data (beam_detector_SM3D_0000000000.vtkhdf etc.) will open successfully in Paraview with the correct mesh from the beam_detector_MESH.vtkhdf file. Each file is a fully valid static vtkhdf file and works independently on how the underlying data is organized (chunked, compressed etc).

You can also extend this, using the same file containers for storage of the actual heavy data, to make a fully valid temporal vtkhdf file. To achieve this we need to use the virtual dataset feature in the HDF5 library, to assemble a virtual dataset where different selections of this dataset points into different datasets in different files. Then we also need to add a Steps group with the temporal information, but this is easy since the mesh is static and only one array is time-dependant.

Create a second python-script in the same folder as your example files:

import h5py as h5
import numpy as np

meshfile = "beam_detector_MESH.vtkhdf"
timesteps = [0, 16, 24, 32, 48, 56, 64, 72, 88, 96, 100]

with h5.File("beam_detector_MASTER.vtkhdf", 'w') as master:
    master.create_group('VTKHDF')
    vtkhdf = master['VTKHDF']
    typeattr = "UnstructuredGrid".encode('ascii')
    typeattr_dtype = h5.string_dtype('ascii', len(typeattr))
    vtkhdf.attrs.create("Type", typeattr, dtype=typeattr_dtype)
    vtkhdf.attrs.create("Version", (2, 0))

    pointdata = vtkhdf.create_group("PointData")
    steps = vtkhdf.create_group("Steps")
    pdoffsets = steps.create_group("PointDataOffsets")

    # Add external links for mesh
    vtkhdf['Connectivity'] = h5.ExternalLink(meshfile, '/VTKHDF/Connectivity')
    vtkhdf['NumberOfCells'] = h5.ExternalLink(meshfile, '/VTKHDF/NumberOfCells')
    vtkhdf['NumberOfConnectivityIds'] = h5.ExternalLink(meshfile, '/VTKHDF/NumberOfConnectivityIds')
    vtkhdf['NumberOfPoints'] = h5.ExternalLink(meshfile, '/VTKHDF/NumberOfPoints')
    vtkhdf['Offsets'] = h5.ExternalLink(meshfile, '/VTKHDF/Offsets')
    vtkhdf['Points'] = h5.ExternalLink(meshfile, '/VTKHDF/Points')
    vtkhdf['Types'] = h5.ExternalLink(meshfile, '/VTKHDF/Types')

    # Find shape and type of data arrays
    itstep = timesteps[0]
    file = f'beam_detector_SM3D_{itstep:010d}.vtkhdf'
    with h5.File(file, 'r') as f:
        dset = f['/VTKHDF/PointData/MY SMOKE DENSITY']
        npoints = dset.shape[0]
        dtype = dset.dtype

    # Create virtual dataset layout
    nsteps = len(timesteps)
    layout = h5.VirtualLayout(shape=(npoints*nsteps, ), dtype=dtype)

    # Add data to layout
    for idx, itstep in enumerate(timesteps):
        file = f'beam_detector_SM3D_{itstep:010d}.vtkhdf'

        vsource = h5.VirtualSource(file, '/VTKHDF/PointData/MY SMOKE DENSITY', shape=(npoints, ))
        layout[idx*npoints:(idx+1)*npoints] = vsource

    pointdata.create_virtual_dataset("MY SMOKE DENSITY", layout, fillvalue=0)

    # Add offsets for each step
    pdoffsets.create_dataset("MY SMOKE DENSITY", data=[i*npoints for i in range(nsteps)])

    # Add offsets for each (static) mesh
    steps.attrs['NSteps'] = nsteps
    steps.create_dataset("Values", data=np.array(timesteps, dtype=np.float64))
    steps.create_dataset("PartOffsets", shape=(nsteps, ), fillvalue=0, dtype=np.int64)
    steps.create_dataset("NumberOfParts", shape=(nsteps, ), fillvalue=1, dtype=np.int64)
    steps.create_dataset("PointOffsets", shape=(nsteps, ), fillvalue=0, dtype=np.int64)
    steps.create_dataset("CellOffsets", shape=(nsteps, 1), fillvalue=0, dtype=np.int64)
    steps.create_dataset("ConnectivityIdOffsets", shape=(nsteps, 1), fillvalue=0, dtype=np.int64)

and run it. The resulting file beam_detector_MASTER.vtkhdf is now a fully valid temporal vtkhdf file with the mesh from the beam_detector_MESH.vtkhdf file and the data from each of the timestep files (beam_detector_SM3D_0000000000.vtkhdf).

Using this method, it is not a requirement that the files containing the individual timestep data are vtkhdf-compatible files, as long as they contain the data some way or another it will work just fine.

2 Likes