VTKHDF proposal: Support splitting datasets in multiple files

Louis_Gombert · May 3, 2024, 7:46am

I admit that I don’t understand how the mechanism put in place today guarantees a level of reading performance? Much depends on the size of the simulation data and the choice of chunk size.

Reading performance can indeed vary depending on the chosen chunk size. Virtual Datasets still use chinking, and should not impact performance for large simulations where it matters the most.

RE-reads (even offset) are avoided by activating the Cache mechanism.

Correct, vthHDFReader uses a cache for partitioned data when merge mode is disabled.

write in parallel on different files to output a first simulation time (1);

This proposal can allow to write multiple files corresponding to different partitions in parallel on different processes. Only the “main” file needs to be serial to aggregate data using Virtual DS from the other files.

then complete these same files by writing the following simulation times

When using the option “UseExternalPartitions”, partitions for each vtkPartitionedDataset are written in different files. With “UseExternalTimesteps”, you can separate data for each timestep in different files. So when you activate “UseExternalPartitions” but not “UseExternalTimesteps”, you can append data in the same partition file for each timestep.