I am developing a python-VTK application and am noticing some unusual behavior in python unittest-based tests I am building alongside my application. I have noticed that tests that are defined in separate suites appear to interfere with each other and my theory is that the underlying python VTK functionality is using some global or otherwise shared information making the scoping of my python tests not as isolated as I’d like. Ultimately I’m looking for tips/guidance on how to correctly set up python unit tests that leverage python VTK in a safe and isolated manner.
I have a couple anecdotal examples, first I noticed that unit tests which pass when run once will sometimes fail if running them multiple times per python interpreter. I have found that after a dozen or so executions of the same test, I’d receive a runtime failure that looks like this (python VTK 9.5.1 on Mac):
Through much testing I discovered that this error can be avoided 100% of the time by turning off python garbage collection with import gc; gc.disable() at the top of my testing script. The effectiveness of this approach has led me to believe that there is something unsafe happening when the python script creates multiple full rendering pipelines and those python objects call underlying C++ libraries to deconstruct. Telling python to never clean up bypasses the issue.
Another example - I have two suites, one of which uses self.render_window.OffScreenRenderingOn() and the other does not. If both tests are run in the same python instance, I notice the offscreen rendering test actually renders to the screen when it shouldn’t. Only when both tests are in the same file will this occur.
I can work on getting a standalone example of this to share but in the meantime I’m curious if anyone out there has run into this or has any tips on isolation and memory safety when running python VTK based unit tests. Thanks!
For the runtime failures, do the stack traces always show InvokeEvent? I suspect that events are being invoked after the observer has destructed. This would explain why disabling gc helps. If you can confirm that InvokeEvent is always involved, I can dig deeper to find the cause.
Regarding the offscreen rendering, I’m not very familiar with that part of the VTK code, but I can imagine that it might involve some global state. VTK’s own test suite always runs each test in its own process.
When I first ran into this issue I ran it by a couple AIs which advised me to better manage tearing down VTK-managed objects myself. This led me to create a tear_down method that attempts to deconstruct everything I use in my top level python class, it looks like this:
def tear_down(self):
if self.controller.timer_id:
self.interactor.DestroyTimer(self.controller.timer_id)
for camera in self.controller.cameras:
self.controller.cameras[camera].RemoveAllObservers()
self.interactor.RemoveAllObservers()
for renderer in self.renderers:
self.renderers[renderer].RemoveAllObservers()
# Mark for deletion
del self.interactor
self.interactor = None
for renderer in self.renderers:
del renderer
Calling the function however appeared to make the problem worse and would result in another type of error signature (in addition to the first one) that looks like this:
Backing this out seemed to completely remove this error signature - from this I concluded that 1. I clearly don’t know the safe way to deconstruct VTK objects in python and 2. AI will not be taking over the world with software anytime soon
VTK’s own test suite always runs each test in its own process.
Thanks for this info - I could do as well on the python side if I absolutely had to.
The tear-down should not be necessary. In any case, the del loop in the following code makes no sense, and deleting an attribute before assigning it doesn’t serve any purpose either.
Now that I know it’s a timer event, that narrows down where I have to look for the bug. Can you give any hints on what kind of observer it is? Mainly, I’m wondering if the AddObserver was called from Python, or if it was called from within VTK itself.
This project is intended to be open source under NASA · GitHub but I’m running into some obstacles in the official software release process which is slowing it down. That said I can probably hack something together in a temporary area in GitHub to provide a state of code where I can reliably replicate this - let me know if that would be useful for you in your efforts. Thanks!
I’ve made a reproducer, it’s not minimal but it’s something that I can use as a starting point for debugging. Interestingly, it only crashes on macos, I haven’t been able to get it to crash on linux.
See the code in the expandable section below. Closing the window that it creates will sporadically cause a crash, causing a stack trace like the one you posted above. And disabling Python gc stops the crash.
Edit: I’ve found a fix that I can merge before the VTK 9.6 release. For now, you can work-around by ensuring that you call interactor.DestroyTimer(timerId) for every timer that you create before the interactor goes out of scope and destructs. Don’t do RemoveAllObservers() because VTK sometimes adds its own observers for its own purposes.
Update - I backed out gc.disable() and added just interactor.DestroyTimer(timerId) as a “tear down” mechanism for my top level class - this definitely helped as I struggled to replicate the InvokeEvent error for quite some time. I will note that after 116 retries of the test I did get a single segmentation fault but couldn’t capture the stack trace as it wasn’t instrumented during that setup. I ended up moving forward turning all my unit tests back on and by keeping gc.disable() at the top of my test I have now executed a few thousand tests without a single issue wrt memory.
So there may still be an issue but it’s much more rare with your suggestions implemented and completely avoidable with python garbage collection disabled. I’ll be happy to re-test when the newest VTK release is available and provide more detail then if I can still reproduce the error.
Thanks David this really helped push my project in the forward direction!
That’s good to hear. If you’re curious about the fix, it’s !12636 and it has been merged into the master branch.
It would be nice if you could eventually remove the gc.disable(), because if there are any remaining memory issues, I’d rather see them fixed, instead of hidden.