vtkImageStencil hangs when run in AWS EC2 instance, but runs fast locally with identical input data

Hello,

I have some code that uses the vtkImageStencil algorithm to create some 3d binary masks from the intersection between a 3d mesh and a 3d medical image.

I run this code locally and it works exactly as expected, and takes less than 100 ms to run.

However, I am having issues when I try to run the same code remotely on AWS.

When I run the code on an EC2 instance, the code gets to the vtkImageStencil algorithm and seemingly hangs forever. I have run htop and see that CPU is at 100%, with plenty of RAM to spare.
Tweaking Mesa GL options doesnt seem to fix anything.

No error messages are produced. The input data is confirmed to be valid and identical in both cases.

Does the vtkImageStencil algorithm require a GPU / hardware acceleration? I cant think of any other reason why it would zip locally and take 100 years remotely.

Furthermore, when I try to run this code on AWS Fargate it just kills the process when vtkImageStencil attempts to execute with no error messages. I suspect this might be because the algorithm is trying to access restricted / non-existant hardware resources?

Any help would be greatly appreciated. Kind of run into a dead end here.

Does the VTK build use threads? Can you try to disable SMP for that algorithm? vtkImageStencil::SetEnableSMP(false)

@spyridon97 @Charles_Gueunet

Thank you for your reply.

I am not building from source, I am instead using the python wrappers using the standard python wheel from pypi.

I tried your suggestion to SetEnableSMP(false) on the vtkImageStencil but the result is unfortunately the same.

The vtkImageStencil class makes no use of the GPU whatsoever. When I run into hangs (in pretty much any program), the first thing I do is attach gdb to the process and get a stack trace. Are you able to do this on AWS?

Are you connecting vtkImageStencil with SetInputData()/SetStencilData() or with SetInputConnection()/SetStencilConnection()? If you’re using the latter, then maybe the freeze is actually happening somewhere upstream in the VTK pipeline.

What version of VTK are you using? 9.2.6?

I am using version 9.2.6 of VTK.
I tried both the Connection / Data approaches and both produce the same issue.

I ran my code with gdb and it produced the follow output.

[New Thread 0x7fff96fbd640 (LWP 70242)]
[New Thread 0x7fff965bc640 (LWP 70243)]
[New Thread 0x7fff95bbb640 (LWP 70244)]
[Thread 0x7fff95bbb640 (LWP 70244) exited]
[Thread 0x7fff965bc640 (LWP 70243) exited]

Thread 26 "python3.7" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff96fbd640 (LWP 70242)]
0x00007fffb2bdcea8 in void vtkImageStencilExecute<unsigned short>(vtkImageStencil*, vtkImageData*, unsigned short*, vtkImageData*, unsigned short*, vtkImageData*, unsigned short*, int*, int, vtkInformation*) () from /home/ec2-user/.local/lib/python3.7/site-packages/vtkmodules/libvtkImagingStencil-9.2.so
(gdb) bt
#0  0x00007fffb2bdcea8 in void vtkImageStencilExecute<unsigned short>(vtkImageStencil*, vtkImageData*, unsigned short*, vtkImageData*, unsigned short*, vtkImageData*, unsigned short*, int*, int, vtkInformation*) () from /home/ec2-user/.local/lib/python3.7/site-packages/vtkmodules/libvtkImagingStencil-9.2.so
#1  0x00007fffb2bd6c84 in vtkImageStencil::ThreadedRequestData(vtkInformation*, vtkInformationVector**, vtkInformationVector*, vtkImageData***, vtkImageData**, int*, int) ()
   from /home/ec2-user/.local/lib/python3.7/site-packages/vtkmodules/libvtkImagingStencil-9.2.so
#2  0x00007fffd09beb22 in vtkThreadedImageAlgorithmThreadedExecute(void*) ()
   from /home/ec2-user/.local/lib/python3.7/site-packages/vtkmodules/libvtkCommonExecutionModel-9.2.so
#3  0x00007ffff7c9f802 in start_thread (arg=<optimized out>) at pthread_create.c:443
#4  0x00007ffff7c3f450 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

So it appears the error is in vtkImageStencilExecute but looking at the source code for function it is not obvious to me why there would be some issue running on EC2 vs a local macbook.

Any ideas?

In addition to SetEnableSMP(False), you can also try SetNumberOfThreads(1). If that doesn’t change the behavior, you can try vtkMultiThreader.SetGlobalMaximumNumberOfThreads(1).

The algorithm has to do a small amount of memory allocation per thread, and it’s possible that the memory allocation is failing. Is your image a multi-component image, or is it a simple greyscale image?

Thank you for your help David.

I have managed to identify the issue and I am recording it here for posterity.

The core issue is that one of the inputs to the vtkImageStencil was a vtkImage that I was seeding improperly with a numpy array. I was creating the blank images that i would draw the stencil on to as below:

    blank_img = vtk.vtkImageData()
    blank_img.DeepCopy(input_img)
    count = blank_img.GetNumberOfPoints()
    scalar_array = np.ones(count, dtype=np.uint16) * foreground_val
    vtk_array = vtk.vtkTypeUInt16Array()
    vtk_array.SetArray(scalar_array, count, 1)
    blank_img.GetPointData().SetScalars(vtk_array)

This worked fine on my macbook, I assume because numpy array memory is contiguous on my macbook but not on the EC2 linux servers. So this vtkImage would cause vtkImageStencil to segfault on EC2 linux.

Changing the code to this snippet below fixed the issue.

    blank_img = vtk.vtkImageData()
    blank_img.DeepCopy(input_img)
    count = blank_img.GetNumberOfPoints()
    scalar_array = np.ones(count, dtype=np.uint16) * foreground_val
    blank_img.GetPointData().SetScalars(numpy_to_vtk(scalar_array))
    return blank_img

So it had nothing to do with gpu, rendering, or threading at all. It was an input data issue but an issue that was very hard for me to detect as the vtkImage objects appears to be identical on the surface, and the code did work as expected on one platform.

Glad that you sorted it out.

It’s worth noting that the following code is tricky even for contiguous arrays:

vtk_array.SetArray(numpy_array, n, m)

The vtk_array is borrowing a reference to the numpy_array’s buffer. So when the numpy_array goes out of scope, the vtk_array is left with a dangling pointer. Dangling pointers are a real nuisance, because sometimes the the program works as expected, sometimes it segfaults, and sometimes it runs but gives incorrect results.