Handling allocation errors in Python

I ran into what I think might be a bug/regression in vtkFeatureEdges, where the filter tries and fails to allocate a very large array.

The following code:

from vtkmodules.vtkFiltersCore import vtkFeatureEdges
from vtkmodules.vtkFiltersSources import vtkSphereSource

N_CELLS = 2_000_000

source = vtkSphereSource()
source.LatLongTessellationOn()
source.SetEndPhi(90)
source.SetThetaResolution(1000)
source.SetPhiResolution(N_CELLS // 1000 + 1)
source.Update()
print("Cell count:", source.GetOutput().GetNumberOfCells())

feature_edges = vtkFeatureEdges()
feature_edges.ExtractAllEdgeTypesOff()
feature_edges.ColoringOff()
feature_edges.BoundaryEdgesOn()

feature_edges.SetInputConnection(source.GetOutputPort())
feature_edges.Update()
print("Boundary edges:", feature_edges.GetOutput().GetNumberOfCells())

crashes (exit code -1073740791) on my system (Windows) with the following output:

Cell count: 2000000
2021-12-21 13:52:58.971 (   0.334s) [                ]vtkGenericDataArray.txx:389    ERR| vtkTypeInt64Array (000002064F960CD0): Unable to allocate 20000000000 elements of size 8 bytes. 

Now this might be a bug in VTK – I have created an issue for it – but my main question is: how can I handle this error in Python? I can observe error events (on the vtkOutputWindow) but I don’t think I can do anything there to return to the calling Python code. Does anyone have any suggestions on how to handle such errors in a Python application?

@Martijn I faced this problem also some weeks ago.

Look at this issue.
It reference the same problem and solution proposed works for me.
Look also at PVGeo. It seems that the error handling was originally implemented there.
It is a workaround and could be nice to have it in vtk :slight_smile:

I’ve tried something like that, but the problem is that the call to Update() never returns – the process just exits…

When the allocation fails, VTK generates an error event, which you can catch, but immediately afterwards it throws a bad_alloc exception, which you cannot catch.

So you can use the error event to give the user some information about what went wrong, but as soon as your error handler returns, Python will exit and there’s nothing you can do about it (other than attaching a debugger to the process).

To my mind, this issue brings up two issues:

  1. Efficient usage of memory in the feature edges filter: It’s likely this could be improved so that you could work on bigger data. However at some point you are going to hit the wall as described in #2 below.

  2. What to do when allocation fails: e.g., according to the output, your application needed to allocate 20 billion 64-bit ints. I’m guessing your system memory is 64 Gbytes or less. Tell me, how would you fix this, or what do you expect VTK to do? Basically you’ve asked VTK to hang yourself which it obligingly helped you do :slight_smile: Sure it could catch the allocation error, but then what?

We can and have been making VTK more memory efficient (not to mention faster). But at some point you run into the limits of your hardware. I’m not sure what we can do about this in a general way.

In Slicer we catch exceptions in C++ and bring up a dialog box warning that the system is technically in an undefined state and they may wish to save their work and restart. I think it’s helpful not to completely bail out of the app just because one allocation failed. It would be even better if VTK could provide some protection so failed allocations didn’t leave the system in a bad state, but of course that’s hard to guarantee.

Thanks Steve for the tips. Can you provide the details of an implementation that you think might work for you?

In the VTK executive, the CallAlgorithm() method is a choke point where a try/catch could be added to respond to exceptions generated by the algorithms.

Hi Will - In Slicer we handle this case in the code below, while processing events at the Qt level, so that if we can tell basically what happened we try to give some helpful suggestions. But since these are exceptions we have to assume that the underlying code is in a bad state, possibly with dangling pointers, so we encourage users to save what they can and restart.

It would be better for users if instead of getting an exception, we got an indication of they state of the system. For example, VTK (or other libraries) algorithms could set a code or invoke an event that meant “operation failed due to lack of memory but the state is not corrupted” then at the application level we could just tell people to try their operation again with different parameters. This could be opt-in at the VTK filter level, so that if you didn’t get the event you assume that the pipeline update has crashed and is corrupted, but a lot of filters could probably detect the issue and report back that they are in a clean state. Maybe even a superclass could implement generic logic.

What to do when allocation fails: e.g., according to the output, your application needed to allocate 20 billion 64-bit ints. I’m guessing your system memory is 64 Gbytes or less. Tell me, how would you fix this, or what do you expect VTK to do?

I guess you are right in that allocation errors in general are difficult to handle, so a good approach would be to test if the input is too large (before running the algorithm).

Note that in this case the huge allocation was unexpected (https://gitlab.kitware.com/vtk/vtk/-/issues/18420) and I was wondering if/how I could do something better than just crash. Trying to prevent allocation errors in combination with David’s suggestion of notifying the user (and then crashing) when an allocation fails seems a reasonable solution.

It’s possible to pre-determine output size, although difficult to do along with possibly significant performance impacts since it depends on the data and filter parameters. Effectively you’d have to run the algorithm twice: once to determine the output size, and then once to generate the output. (This is what many parallel/threaded algorithms do anyway, so in theory this could be formalized to include output size estimation.) There are also side effects to take into account like building locators or topological structures (e.g., BuildLinks()). It’s doable, but a very, very large effort to undertake. And to my mind it still does not satisfactorily address the fundamental question of what to do when the data is too large.