About proper distributed AbortCheck

  1. Context

A few years ago, @Stephen_Crowell @Christos_Tsolakis and @berk.geveci led an effort to improve abort mechanism in VTK, this effort was not fully integrated but great point and design were made here: https://gitlab.kitware.com/vtk/vtk/-/issues/18463 (must read).

Im picking this up again, specifically I’m looking into the possibility to abort a filter while it is running in a distributed pipeline.

2. The CheckAbort problem

With the current implementation of CheckAbort(), the idea is to magically position the AbortExecute flag on all distributed processes at the same time. Then whenever each process use CheckAbort() on their own time, they will abort and return in error.

Since distributed processing in VTK is implemented using MPI, this is just not possible, as there is no way to talk to a distributed process while it is engaged in communication with another process sending messages back and forth.

Moreover, even in the scenario were we could position a flag on all processes at the same time, then there are still cases where it would not work, eg:

tick 0: proc A is processing, proc B is processing
tick 1: proc A is waiting on a MPI receive, proc B is doing some heavy processing
tick 2: all abort flags are set on all processes, proc A is still waiting on a MPI receive, proc B is still doing some heavy processing
tick 3:proc A is still waiting on a MPI receive, proc B finished processing, and check abort flag
tick 4: proc A is still waiting on a MPI receive, proc B aborts everything and never send with MPI
tick 5: proc A is still waiting on a MPI receive for a message that will never come, dealock

So clearly, current CheckAbort() mechanism doesn’t fit distributed computing, we need a way to synchronize these checks.

3. Two main usecases

There is two usecases to account for, MPI using filters, and non-MPI using filters.

For MPI-using filters, there is no magic possibility, the CheckAbort calls MUST be synchronous and akin to a MPIBarrier call. All processes wait for each other and then reduce the abort state so that everyone abort at the same time if any processes was aborted.

For non-MPI using filters, there can be more leniency, as the CheckAbort and reduction of the abort state can be started at different point on different processes, they will ultimely wait for each other and then abort together.

But there is one problem. processes don’t wait around, a singular processes with no data to process (because of a clip at the beginning of the pipeline for example) would just keep going and finish updating fully while the other processes are still clipping!

There is no other choice then to synchronize at the end of processing of each filter, before moving to the next filter.

It is not ideal in a task based distributed computing system but in reality, with VTK pipeline, processes with not a lot of work with one filter tends to not have a lot of work with all filters, so there is no balancing to do.

4. Actual Implementation

So we need two implementation, a synchronous impl and a lazy impl, and these will be triggered by calls from within the filter implementation. The lazy impl also MUST not depend on anything but Common modules in VTK because we cannot add dependency to ParallelCore.

So this must be handled using events.

vtkAlgorithm::CheckAbortAndInvoke that invokes CHECK_ABORT event and then actually CheckAbort is pretty easy, as long as it documents the needs for synchronous calls on MPI filters.

Adding vtkAlgorithm::CheckAbortDone() that invoke a new CHECK_ABORT_DONE event will then takes care of the synchronization of the end of filters.

5. What about the inter process communication ?

This is where it is still a bit murky to me.
I could easilly observe these events from the ParaView server layer and react on it to do the actual abort state reduction / syncronisation but I feel like this should be provided by VTK somehow, and I don’t know exactly how this could look like.

A possibility could be an AbortObserver that would be added to vtkAlgorithm in a generic version and ParallelMPI would factory-provide a specialized version that would be responsible to reducing the abort. This is more or less how the vtkProgressObserver is implemented.

Let me know what you think!

4 Likes

@mwestphal Have you thought about filters that run other filters internally? Could that cause deadlocks in the same way? Especially when processing distributed datasets that may not have cells on every rank?

Hi @dcthomp

Good question!

I though about it and it seems fine, let me develop.

There are two ways to integrate a filter inside another filter, either through SetInputData, or SetInputConnection.

SetInputData is straightforward, the CheckAbortEvent system will not be listening to this internal filter, which means that it will behave as if its a simple CheckAbort, which will just not abort at all because the flag is not set.
Of course, the high level filter will need to perform proper CheckAbort to be abortable, as explained in my message above.
Actually forwarding the CheckAbortEvent and aborting lower level filter would actually be possible, but it would some manual coce and the proposition above doesnt provide tools for that specifically.

SetInputConnection is mot complex because the SetAbortExecuteAndUpdateTime not only put the abort flag on the filter but also on downstream filter as explained in Stepen issue linked above.
This means the internal filter will abort as well. In distributed context, it may means that only the rank 0 will abort unless the abort flag reduction takes place. For the abort flag reduction to take place, then it means the filter will need to implement proper AbortCheck forwarding out AND in.
But if not implemented, it will only means the other ranks won’t be aborting right away, which is not a big deal UNLESS this is a MPI filter that expect communication to happen.

In any case, this is a very niche usecase that is also currently not supported any better, with the classic CheckAbort, we would have the same behavior, just no tooling at all to reduce the abort flag. With my proposition, there is a litle bit of tooling for filter developpers to use to try to make this usecase work.

I hope that answers your question!

1 Like

@mwestphal Thanks, I think that covers most of my concerns. If there was a way to detect that an internal filter was configured with SetInputConnection() to the “external” filter’s input, that would be nice (because we could at least warn that MPI communication could cause deadlocks during aborts in that case). But I don’t think there is a way to do that. But if the documentation doesn’t already say not to use SetInputConnection() inside RequestData() implementations, it should.

It should not, this is a valid workflow, albeit rarely used.

I would like to mitigate:

I do not see a case where it is the only, good, way to go. But I can see unwanted side effects. So IMO we should discourage this pattern. But it may be a discussion for another place.