Introducing a new data comparison utility in VTK

We are thrilled to announce a new utility in the VTK (Visualization Toolkit) library that simplifies the task of comparing various VTK entities, including data objects, field data, and abstract arrays. This utility is particularly beneficial for developers who are writing tests or need to validate data integrity across different computational stages.

As of now, this utility is in the merge request stage, and we are actively seeking feedback from the VTK community. We are open to suggestions for enhancements or additional features that could make this utility even more powerful and useful. Your insights could play a crucial role in refining this tool, so we encourage you to participate in the review process. Feel free to check out the merge request and contribute your ideas.

Features

  • Multi-Entity Support: The utility is capable of comparing vtkDataObjects, vtkFieldData and vtkAbstractArray.
  • Tolerance-Based Matching: The utility allows for numerical comparisons with a specified tolerance factor, making it robust for floating-point data comparisons.
  • Detailed Logging: Utilizes vtkLog for detailed error reporting, making it easier to identify discrepancies.
  • Ease of Use: The utility is designed to be straightforward to use, making the task of writing tests or validating data much simpler.

Usage

The utility offers a set of functions such as CompareDataObjects, CompareFieldData, CompareAbstractArray, ComparePoints, and CompareCells. These functions take in the entities to be compared along with an optional tolerance factor for numerical comparisons.

#include <vtkTestHelper.h>

// ...
// Some code instantiating objects we want to compare

bool result = vtkTestHelper::CompareDataObjects(dataObject1, dataObject2, toleranceFactor);
result &= vtkTestHelper::ComparePoints(dataObject1, dataObject2, toleranceFactor);
result &= vtkTestHelper::CompareCells(dataObject1, dataObject2, toleranceFactor);
result &= vtkTestHelper::CompareFieldData(fieldData1, fieldData2, toleranceFactor);
result &= vtkTestHelper::CompareAbstractArrays(array1, array2, toleranceFactor);

Benefits for Testing

This utility significantly eases the process of writing tests for VTK-based applications. By providing a robust and comprehensive comparison mechanism, it allows developers to focus on the logic of their tests rather than the intricacies of data validation.

1 Like

Every single implementation are covered ? Or just the main ones, vtkPolyData, vtkUnstructuredGrid, … ?

Tolerance-Based Matching

Is it a simple computation, eg: X - Y <= Tol or something more fancy ?

What would be nice is to be able to get a metric instead of just a boolean result. But it may be out of scope.

Every single implementation are covered ? Or just the main ones, vtkPolyData, vtkUnstructuredGrid, … ?

You can put as input the following types: vtkImageData, vtkUniformGrid, vtkRectilinearGrid, vtkPointSet, vtkStructuredGrid, vtkExplicitStructuredGrid, vtkPolyData, vtkUnstructuredGrid, vtkHyperTreeGrid, vtkTable. There is no support for vtkGraph. All types don’t follow the same code path. For example, vtkImageData doesn’t check cell types or point positions, but the extent, spacing, origin and orientation.

Is it a simple computation, eg: X - Y <= Tol or something more fancy ?

Given 2 input vectors X and Y, what is checked is ||X - Y||^2 <= epsilon * (||X||^2 + ||Y||^2) * Tol, where epsilon is the smallest floating point number such that 1.0 + epsilon != 1.0. Tol should be greater than 1.0.

For integral vectors, the tolerance is disregarded, we look for strict equality.

Edit: Point Data and Cell Data are also checked. The formula above works for any dimension.

What would be nice is to be able to get a metric instead of just a boolean result. But it may be out of scope

Probably out of scope. It would be hard to define a proper metric given the extent of stuff checked. What is the metric when the topology and geometry is correct but a vtkVoxel has been replaced by a vtkHexahedron?

2 Likes

Very useful! Great work…

The link provided to the merge request “https://gitlab.kitware.com/vtk/vtk/-/merge_requests/1063” is incorrect. It should be “https://gitlab.kitware.com/vtk/vtk/-/merge_requests/10633” note the extra “3”.

As far as metrics, I would probably use this by writing a utility program that indicated true/false for each component of a data object and provide a table of results. And with a little work you might be able to capture the first few “items” that are different for each comparison function (point coords, cell types), whether certain data arrays are missing or named differently, etc. to give hints as to what is different.

I’m also guessing that some data objects can be marked unequal but in actuality they are renumbered (point ids, cell ordering is shuffled). I’ve seen this happen in a few multithreaded filters where the order of execution changes from run to run (I try to correct this when I see it but I’m guessing it still exists).

The utility is invariant to point / cell shuffling.

When the utility catches an error, there is a comprehensive error logging stack, letting you know at what point the discrepancy was caught. It goes as far as outputting the name of the array causing the failure and at which position in the array. Do you think it is sufficient or would a table of error codes be more useful?

Some idea: maybe allow to set a stride/sampling strategy when comparing data array, to speed up the computation but still have a good overview.

It actually would be trivial to add an API taking as input a vtkIdList* for testing arrays. It would literally be one line in the function’s body in the .cxx. Stride without an input vtkIdList* would be doable with a little work, however we would need to think about how to overload to not induce ambiguities when invoking.

1 Like

Quick question Yohann, are these comparison functions available from Python? With a quick search I didn’t see any Python examples. I think we should add some…

Currently, this comparison tool is not available from Python. I’ve chatted with Yohann and we have two, non-exclusive, paths forward. 1) Create a filter that takes two or more inputs and compares selected portions of the input data objects against one another; and/or 2) simply wrap the comparison functions into Python so they can be used as utility functions etc.

Comments? Preferences?