Currently, .vtm (multiblock), .vtpd (partitioned dataset) and .vtpc (partitioned dataset collection) only support file pointers.
eg:
<?xml version="1.0"?>
<VTKFile type="vtkPartitionedDataSet" version="1.0" byte_order="LittleEndian" header_type="UInt32" compressor="vtkZLibDataCompressor">
<vtkPartitionedDataSet>
<DataSet index="0" file="testxmlpartds/testxmlpartds_0.vti"/>
<DataSet index="1" file="testxmlpartds/testxmlpartds_1.vti"/>
</vtkPartitionedDataSet>
</VTKFile>
This .vtpd file do not contain actual data but only metadata about the partitioned dataset and pointers to each of the partition using the <DataSet file="relative/path/to/file.ext"
xml syntax.
This is very useful in many usecases, as it lead to an easy inspection of individual files when needed, perfect for scaling up, reusing shared files and distributed reading in HPC context.
However, there is two cases where it can causes issues:
-
When sharing a files, one needs to (think about and then) create an archive containing all the needed files. We can see that this is often forgotten by beginners.
-
When reading in serial very distributed files, there is an overhead to opening and closing files which can add up to something really impactful.
One solution to this issue to consider is to add the possibility to inline data directly in the .vtm/vtpc/vtpd file, like this:
<?xml version="1.0"?>
<VTKFile type="vtkPartitionedDataSet" version="1.0" byte_order="LittleEndian" header_type="UInt32" compressor="vtkZLibDataCompressor">
<vtkPartitionedDataSet>
<DataSet index="0" inlined="true">
<ImageData WholeExtent="0 10 0 10 0 5" Origin="0 0 0" Spacing="1 1 1">
<Piece Extent="0 10 0 10 0 5" >
<PointData Scalars="RTData">
<DataArray type="Float32" Name="RTData" format="appended" RangeMin="-16.577068329" RangeMax="260" offset="0" />
</PointData>
<CellData>
</CellData>
</Piece>
</ImageData>
</DataSet>
<DataSet index="1" inlined="true">
...
</DataSet>
</vtkPartitionedDataSet>
<AppendedData encoding="base64">
_AQAAAACAAABYCwAAcAoAAA==...
</AppendedData>
<AppendedData encoding="base64">
...
</AppendedData>
</VTKFile>
Note the appended binary data, in separated xml block for each inlined dataset.
Of course, this file would not be optimized for distributed reading but this is not the objective here.
One could argue that the same logic could be applied to .pvtx files, but these files are dedicated to distributed datasets and use a different syntax parsed in a different part of VTK XML code, so it seems to be not very useful and outside of the scope of this proposition.
Please share your thoughts.