VTK XML format and data type specific variants

utkarshayachit · June 23, 2020, 11:39am

Does anyone know why do separate VTK-XML file extensions exist? For Legacy VTK files, we have .vtk which suffices to write all supported VTK data types. For VTK-XML, we have different variants .vtp for poly-data, .vtu for unstructured grid etc. Any reason why that was done? It would be easier to just have .vtx (and maybe .pvtkx, the parallel variant) file extension that supports all known VTK dataset types. Is there something I am missing?

lassoan · June 23, 2020, 7:55pm

For vastly different data types, such as mesh and image, using a different file extension makes a lot of sense.

For example, I want to use a different default viewer for meshes and images, but file association is based on file extension.

Using different file extensions also allows simple, early filtering. Most applications only support certain type of VTK files (surface mesh, volumetric mesh, or image) you don’t want to allow them to select (or even worse, upload) files and then display an error message that sorry, I cannot work with this kind of vtk files.

We often have problems when users report problems with .vtk files, as we don’t know what kind of data they have, and they don’t know what is the difference between a surface mesh, an unstructured grid, and a structured grid. There is no such problems with .vtX files.

mwestphal · November 12, 2020, 9:51am

I agree with @lassoan here, but in a way the .pvd format is way to fix this already, is it not ?

Alexandre_Minot · July 14, 2021, 2:01pm

I agree with Utkarsh. Actually, I think it should be .xml. Extensions are relative to the format on disk, not what the data represents. When you export a color map in ParaView, it shows up as a .json, not .paraviewcolormap. The same discussion can be had for cgns files. ParaView (and a lot of other people) write cgns files as .cgns. I think it should be .hdf or .adf, one of the two disk implementations of cgns. That’s the first thing you need to know to select a reader.

mwestphal · July 14, 2021, 2:04pm

Definitelly not.

You do not want to pick the right reader everytime. One extension should be openable with one reader in VTK.

utkarshayachit · July 14, 2021, 2:05pm

My vote would be for a extension e.g. .vtx

pieper · July 14, 2021, 2:07pm

I’m with @lassoan on this - the “.vtk” extension is widely used in some communities to mean polydata, so people are surprised to see image data in it.

The compound extension idea is working well I think. There are many good examples like .tar.gz', .nii.gz, and in Slicer we have made good use of .seg.nrrd and we’ll probably use that pattern extensively going forward.

So here I’d vote for something like .pd.vtx.

lassoan · July 14, 2021, 2:50pm

@Alexandre_Minot @utkarshayachit What is the problem you wan to solve? Is it too tedious to set up file associations? Or hard to set up filters in file selectors? Would you really want to use .vtx` for all data types and file formats that VTK recognizes (stl, ply, jpg, tiff, …)?

You can of course strip away the standard VTK file extension, invent your new file extension, and use it in your software. You don’t need to change anything in VTK for this. I just don’t see the point in using the same file extension for storing vastly different type of data, just because they happen to use the same container. For example, even though HDF5, zip, xml, and json are used as containers for many file formats, each have its own file extension.

Composite file extensions would help, but you would still run into trouble in some important use cases, because often only the last file file extension component is recognized. For example: You could not associate one application to view mesh files and another to view image files. You could not help the user to select the correct file type in file selectors (your application may expect an image input but the user could choose a mesh, because you cannot filter based on file extension).

utkarshayachit · July 14, 2021, 4:10pm

What is the problem you wan to solve?

Very good point. I see it as two fold:

Unified file extension: using different extensions based on data types makes it harder to put together generic pipeline scripts that can handle all types of input data. Such scripts have to have some runtime component that changes the filename based on the input data type, currently, which is unnecessary and unique to VTK-XML readers/writers.
Unified reader/writer implementation: current we have a convoluted hierarchy of XML readers and writers. Consequently, developers have to create the write type of XML reader or writer based on the data type. Having a single vtkXMLDataReader/vtkXMLDataWriter will not only simplify the implementation but also make it easier to use. I know there’s a GenericXMLReader/Writer, but that doesn’t address the fact that implementation is still convoluted, split among too many subclasses and hard to debug/develop/modify.

lassoan · July 14, 2021, 4:52pm

A script usually takes input file names from command-line arguments or files. How does it matter what the filename is?

Let’s say your script accepts png, tif, ply, stl, vtp, and vti file extensions now. Would you like the script to only accept vtx extension? Or the script would take png, tif, ply, stl, and vtx?

I can understand this requirement and indeed VTK is missing a generic file reader. Currently, we have to manually instantiate a different class for reading stl, ply, png, mha, nrrd, jpg, tif, … files. It would be nice to have a single reader that would inspect the input filename (and maybe content) and would instantiate the most appropriate reader and use it to read the file.

Note that this requirement is not related to the discussion above, because even if we merge a few VTK file formats, we still have dozens of other file extensions that we want to read with a single class.

The implementation is not that trivial. It is easy to implement a simplistic general reader/writer, which can be used in tests and examples and maybe some simple workflows. However, for real-world, complex, robust solutions you would need infrastructure for registering custom readers and writers, have an arbitration logic to choose the most appropriate reader and writer when there are multiple candidates, allow the user to provide additional metadata or reading/writing options that are specific to a file format, display GUI for users to provide these data, meaningful error/warning messages, etc. Implementing these in a completely generic way in VTK would be very hard - it is much simpler for everyone if this complex logic is implemented at application level; or maybe in a VTK remote module.

utkarshayachit · July 14, 2021, 5:42pm

Scripts are often written like so

... blah blah blah..

SaveDataSet(producer, "../foobar.vtk")

Here, SaveDataSet call will have to change based on what the data type is for VTK-XML files.

Let’s say your script accepts png, tif, ply, stl, vtp, and vti file extensions now. Would you like the script to only accept vtx extension? Or the script would take png, tif, ply, stl, and vtx?

I think this a bad comparison. TIFF, PNG, etc. are different file formats entirely. If the user wants to write a file in PNG format, one can easily specify so in the script.

SaveDataSet(producer, "..../filename.png")

Now, if the user wants to write the data in VTK-XML format, this is not sufficient. He has to determine which extension it’s going to be based on the input type and then generate.

We are not talking radomly combining file extensions here. We’re saying for all files in the VTK-XML format, it makes since to have single extension than different ones based on the data type. This split is unique to VTK-XML formats, and my original question was aimed at understanding why that was done. From the discussion, I am concluding there’s really no rhyme or reason for that decision,

Again, that is not the core point here. It’s simply limited to VTK-XML file formats.

lassoan · July 14, 2021, 7:01pm

Thank you, it is getting a bit more clear what you are trying to achieve.

You can already write a reader and a writer class that delegates data reading/writing to the appropriate class (same way as vtkDataSetReader). No need to change anything in VTK or ask any VTK users or developers about it.

So, I guess the question is if the community would find these generic reader/writer classes useful (if the small conveniences they provide for some use cases outweigh the small inconveniences they introduce in other use cases). If many people say yes then it makes sense to add the classes to VTK, otherwise it is better to keep them outside. Maybe having it in a remote module would be a good tradeoff - being easily accessible for developers but without making any long-term commitments.