About inlined multiblock and partitioned dataset XML data format

Currently, .vtm (multiblock), .vtpd (partitioned dataset) and .vtpc (partitioned dataset collection) only support file pointers.

eg:

<?xml version="1.0"?>
<VTKFile type="vtkPartitionedDataSet" version="1.0" byte_order="LittleEndian" header_type="UInt32" compressor="vtkZLibDataCompressor">
  <vtkPartitionedDataSet>
    <DataSet index="0" file="testxmlpartds/testxmlpartds_0.vti"/>
    <DataSet index="1" file="testxmlpartds/testxmlpartds_1.vti"/>
  </vtkPartitionedDataSet>
</VTKFile>

This .vtpd file do not contain actual data but only metadata about the partitioned dataset and pointers to each of the partition using the <DataSet file="relative/path/to/file.ext" xml syntax.

This is very useful in many usecases, as it lead to an easy inspection of individual files when needed, perfect for scaling up, reusing shared files and distributed reading in HPC context.

However, there is two cases where it can causes issues:

  1. When sharing a files, one needs to (think about and then) create an archive containing all the needed files. We can see that this is often forgotten by beginners.

  2. When reading in serial very distributed files, there is an overhead to opening and closing files which can add up to something really impactful.

One solution to this issue to consider is to add the possibility to inline data directly in the .vtm/vtpc/vtpd file, like this:

<?xml version="1.0"?>
<VTKFile type="vtkPartitionedDataSet" version="1.0" byte_order="LittleEndian" header_type="UInt32" compressor="vtkZLibDataCompressor">
  <vtkPartitionedDataSet>
    <DataSet index="0" inlined="true">
       <ImageData WholeExtent="0 10 0 10 0 5" Origin="0 0 0" Spacing="1 1 1">
        <Piece Extent="0 10 0 10 0 5"                                                     >
          <PointData Scalars="RTData">
            <DataArray type="Float32" Name="RTData" format="appended" RangeMin="-16.577068329"            RangeMax="260"                  offset="0"                   />
          </PointData>
          <CellData>
          </CellData>
        </Piece>
      </ImageData>
    </DataSet>
    <DataSet index="1" inlined="true">
    ...
    </DataSet>
  </vtkPartitionedDataSet>
  <AppendedData encoding="base64">
   _AQAAAACAAABYCwAAcAoAAA==...
  </AppendedData>
  <AppendedData encoding="base64">
   ...
  </AppendedData>
</VTKFile>

Note the appended binary data, in separated xml block for each inlined dataset.

Of course, this file would not be optimized for distributed reading but this is not the objective here.

One could argue that the same logic could be applied to .pvtx files, but these files are dedicated to distributed datasets and use a different syntax parsed in a different part of VTK XML code, so it seems to be not very useful and outside of the scope of this proposition.

Please share your thoughts.

2 Likes

The current modus operandi imposes to create as many files as there are meshes as well as multiblock/partitioned dataset.
Some simulations opt for many small meshes (a lot of thousands) representing a part of the simulation in order, during visualization and analysis, to be able to isolate them easily thanks to the Multibloc Inspector.
Let’s take the case of an HPC simulation which, after post production, produces a single file of results nevertheless differentiated by element of a building: walls, doors, windows, rooms…
In an HPC context, the explosion of files whose inodes pose problems.

We easily admit that when our codes (or those of our collaborators) come out of the VTK XML, it does so in ASCII. It does not require linking with VTK, building a VTK representation before saving it.
Of course, we appreciate the binary rewrite offered through ParaView.

This is why we would be very interested in being able to write the content of our simulation in a single VTK XML file in ASCII.

We would rather have imagined replacing the line:

    <DataSet index="0" file="testxmlpartds/testxmlpartds_0.vti"/>

by the contents of this file and the same action for the next description :

    <DataSet index="0" file="testxmlpartds/testxmlpartds_1.vti"/>

Contrary to what you suggest, it would then not appear AppendedData tag as high in the description.

What you describe is exactly how it would look like in ASCII mode. Appended is needed in Binary mode.

Couldn’t that stay within the description of each mesh (formerly in a .vti)?

You can take a look at binary .vti file, they already contain an appended section.

Precisely… if we copy the content in place of the link given by a file name, it should not end up outside this section.

I dont think we want to mix the binary appended part inside the XML part though.

In general I would not try to monkey with the XML readers/writers but investigate if we could rather leverage other formats (Adios, fides, …) to achieve that merge goal while keeping it efficient.