VTKHDF size is way bigger than VTM files. Is this expected?

lorenzovecchietti · April 15, 2025, 7:28am

I have two scripts that operate on the same set of files, which consists of 160,000 PolyData objects created using pyVista. Initially, I used pv.multiblock and its .save() method, which employs vtkXMLMultiBlockDataWriter. This resulted in a VTM file accompanied by a folder containing 160,000 VTP files. I found that ParaView performed extremely slowly when handling this large number of files.

Consequently, I wanted to try outputting a single file while still maintaining the separation between the PolyData objects. I attempted this using the VTK HDF format, and the script I used is as follows:

multi_block = vtkMultiBlockDataSet()
for block_number, (name, mesh) in enumerate(objects):
    multi_block.SetBlock(block_number, mesh)
    multi_block.GetMetaData(block_number).Set(vtkCompositeDataSet.NAME(), name)

writer = vtkHDFWriter()
writer.SetFileName("test.vtkhdf")
writer.SetInputData(multi_block)
writer.Write()

To my surprise, the original VTM file and the associated VTP files occupied 1.29 GB in total (each VTP file contains a simple object with a few faces), whereas the VTKHDF file is 290 GB. Is this expected behavior? What might I be doing incorrectly?

Louis_Gombert · April 15, 2025, 7:39am

Hello, can you try setting the chunk size to a lower value when writing? VTK: vtkHDFWriter Class Reference

Compression may also help

lorenzovecchietti · April 15, 2025, 12:47pm

Thanks, it worked!

It’s not 100% clear to how to choose that number, but setting the chunk size to 200 and compression to 9, reduced the size to 9 GB.

Louis_Gombert · April 15, 2025, 1:12pm

Nice! You can read more about HDF5 chunking here: Chunking in HDF5