vtkImageDifference does not detect image differences

vtkImageDifference seems to detect zero or much less differences than VTK 9.1. I ran into this issue while working on migrating from VTK9.1 to VTK 9.2: all image tests seemed to succeed except for one that expected differences instead of no or small differences.

The issue is known already for some months, but I think it is good that the community is aware of it in case they use the filter, as we do, for comparing images in their test suite. Until the issue is resolved, a workaround is to revert the implementation of the filter to the VTK9.1 version in your code base, I suppose.

I agree this is important, anyone using vtkImageDifference should setThreshold(0) to be able to use it correctly.

WDYT @will.schroeder ?

Mathieu, I think your suggestion is fine for now - I assume that you mean to set the threshold for this one test? (If all tests, then of course we are going to have many other tests fail.) The problem is that due to the inherent fuzziness of this comparison, coupled with variations in how different platforms render, means we are playing a game of whack-a-mole.

I think that we’ve relied too much on image results for a lot of things myself. It’d be much more reliable to test the data directly (where possible/feasible). Sure, it’s not easy, but testing “does it look right from this one specific camera angle” for I/O tests has never been what I’d call “reliable” or “robust” anyways. For filters, you could test things like “when clipping with a plane, are all points in the output on one side of it?” and such.

As for image difference being “fuzzy”, this has, I suspect, hidden many test failures over the years. I think the intended case was to allow for slight font rendering differences or anti-aliasing, but it has probably masked things like edge rendering changes and the like as well.

While VTK/ParaView have good coverage, I think their strategies could use some improvement. Yes, it’s a long project, but those are the kinds of things I think about to, so…

I ran our testsuite with tolerance set to zero and I’m looking at 200 image tests that now fail, often because of a very small color difference :frowning_face:

@will.schroeder @andreasbuykx : Let me clarify the issue because I feel like it is unclear for now.

What happens is that the value provided as a threshold through setTreshold does not correspond to the value one would get when doing getThresholdedError, but it is more close to threshold*threshold.

Let me even be more precise.

three cases:

  1. imageA and imageB are the same down to pixels
  2. imageA and imageB are almost the same, small differences but considered valid
  3. imageA and imageB are visually different (as in the issue linked)

If you want to know that actual computed image difference between the images, you need to use setThreshold(0) and then use getThresholdedError(). Lets say for our different cases, it will be:

  1. 0
  2. 31
  3. 210

If you want case1 and case2 to be a “sucess” and case 3 to be a “failure”, one would expect that setting threshold to 50 would be a reasonable approach, except it it not !.

Indeed, in the way the threshold is used in the computation, it is more close to be squared, in that case, as if getTresholdedError was 250, which make case 3 a success.

I think this behavior is not documented and very counter intuitive and should be fixed, I took a quick look at the inmplementation but couldnt figure it out in a reasonnnable timeframe

But there is a very simple work around !

Just always use setTreshold(0) , then use if (getThresholdedError() > 50) to determine if the test is a success or a failure.

This is F3D code for checking against baselines (f3d/image.cxx at master · f3d-app/f3d · GitHub):

  vtkNew<vtkImageDifference> imDiff;
  // handle threshold outside of vtkImageDifference:
  // https://gitlab.kitware.com/vtk/vtk/-/issues/18152
  error = imDiff->GetThresholdedError();

  if (error <= threshold)
    return true;
    return false;

I hope it clarifies the issue.

Wow, I hadn’t fully appreciated how whacked out this is, thanks for the explanation! Yes, this needs to be cleaned up, or at least documented in a sensible way.

My main problem is that the result of the algorithm changed from 9.1 to 9.2.

If the change was intentional, then I’ll have to go over all (several hundred) tests and adjust the tolerances that we use to decide whether the error reported by the algorithm is acceptable.

If the change was an error I will have to either wait for a bug fix, roll back this algorithm to the 9.1 version, or do the workaround (but then I still have to go over the tolerances, because the reported errors have changed).

We use the same images for both windows and linux machines, and define machine dependent tolerances, so the effort is doubled. So, if there is any way at all in which we can get back the 9.1 errors I would be very happy.

There is also this related issue which found that the image difference is not symmetric and the order of the input images matters.

The fix has been merged in VTK master, vtkImageDifference new behaves exactly like vtk 9.1 version.

Any project using VTK to compare image in CI since laster year may be impacted by this change.

ParaView was impacted, ~20 tests and baselines needed to be updated.