Introduce `vtkForEach` filter

Charles_Gueunet · October 9, 2023, 9:11am

This post presents a new vtkForEach filter. Included in Common/Execution model, this is a filter of a new kind, operating at the pipeline level. Its general purpose is to create a loop on a sub-pipeline part. It is heavily inspired by the ttkForEach filter.

⟶

Left: example of a vtkForEach applied on a temporal data with an harmonic field evolving through time. On the sub pipeline, we extract a slice and a contour on the slice, leading to sinusoidal lines. The result is aggregated on a partitioned dataset shown on the right, colored by block id.

Overview

The goal of the vtkForEach is to provide the input for the sub-pipeline at each iteration. Doing so requires defining a loop criterion on the input data object, which is done by a dedicated class implementing the vtkExecutionRange interface. For example, we provide the vtkTimeRange, which loops over the discrete time steps of the input data object.

On the other side of the pipeline, the vtkEndFor receives the result of each sub-pipeline execution and uses a strategy following the vtkExecutionAggregator interface to reduce / aggregate all these data objects. For now, the vtkAggregateToPartitionedDataSetCollection is provided, which inserts each sub-pipeline results in a partition of a vtkPartitionedDataSetCollection.

Once the result has been processed, the vtkEndFor is in charge of telling the VTK pipeline that a new REQUEST_UPDATE_EXTENT / REQUEST_DATA pass is needed, using the vtkStreamingDemandDrivePipeline::CONTINUE_EXECUTING() key, until the vtkForEach filter indicates the end of the iterations.

In some way, this duo of filters can be seen as an externalization of the CONTINUE_EXECUTING mechanism. They make it easy to apply this key to filters that do not support it natively or to forge specific sub-pipeline to loops on. With some quick developments, it is even possible to range or aggregate data in custom ways, simply in implementing one of the two following interfaces:

vtkExecutionRange

Interface description:

An execution range is similar to a vtkAlgorithm, receiving a data object with associated pipeline information and providing a new object with new information. For this reason, the interface is a subset of the vtkAlgorithm one, containing the four Request* methods, connected to the corresponding pipeline passes received by the vtkForEach. A minor distinction is the addition of the iteration input parameter in the RequestUpdateExtent() and RequestData(), used to extract the right subpart of the input. For example, vtkTimeRange overrides the RequestUpdateExtent() method to set the UPDATE_TIME_STEP() key to the current iteration, in order to loop over time.

Ideas:

Some other execution range that could be implemented:

Range over blocks of a composite data set
Range over rows of a table
Range over points / cell attributes
Repeat N times (for benchmarks ?)
…

vtkExecutionAggregator

Interface description:

An execution aggregator has two main methods. The first one is Aggregate(vtkDataObject* input) which is called at the end of each sub-pipeline execution, with the corresponding data object. In the existing vtkAggregateToPartitionedDataSetCollection, this method appends the input in an internal vtkPartitionedDataSetCollection. The second method is GetOutputDataObject() which allows retrieval of the results after all the iterations are done. If a reduction operation is done, it should be implemented here.

Ideas:

Some other aggregation strategies could be implemented:

Appends all dataset into an unstructured mesh
Resample all dataset into an image data
Discard aggregator (do nothing, just used to do clever stuff in the loop)
With a few changes, generate a temporal data object
…

If you feel like implementing a new range or aggregator, we would be glad to help you do so!

Thanks @JonasLukasczyk for the initial work

Yohann_Bearzi · October 9, 2023, 1:27pm

This is a very interesting concept. After taking a quick glance at the code, I have a couple questions:

AFAICT vtkForEach only supports filters with 1 input port and 1 input per port, as well as 1 output. Would it be hard to generalize it a bit and have a general behavior regardless input / output size?
I see a potential issue with Catalyst with vtkTimeRange. You can have the situation where you do not have a TIME_STEPS key defined in situ, even if the sim is temporal. Currently vtkTimeRange would take such cases as a regular non-temporal pipeline. There is a key that ParaView sets when running Catalyst: NO_PRIOR_TEMPORAL_ACCESS. When the key is set, what would happen if we set vtkTimeRange::Size() to return the maximum integer? Would it still work correctly? When the key equals NO_PRIOR_TEMPORAL_ACCESS_RESET, the user asked to reset the filter, so we could probably find a way to make it act on Size() so vtkForEach would know to reset?

Charles_Gueunet · October 9, 2023, 1:43pm

AFAICT vtkForEach only supports filters with 1 input port and 1 input per port, as well as 1 output. Would it be hard to generalize it a bit and have a general behavior regardless input / output size?

the vtkForEach itself is indeed a 1:1 filter. However, it can be connected to a filter having several inputs and should behave accordingly. Depending on the range mechanism, upside pipeline will or will not need update, impacting the other input of the filter inside the sub-pipeline accordingly. For exemple, Time range should behave as expected, with consistent times.

I see a potential issue with Catalyst with vtkTimeRange. You can have the situation where you do not have a TIME_STEPS key defined in situ, even if the sim is temporal. Currently vtkTimeRange would take such cases as a regular non-temporal pipeline. There is a key that ParaView sets when running Catalyst: NO_PRIOR_TEMPORAL_ACCESS. When the key is set, what would happen if we set vtkTimeRange::Size() to return the maximum integer? Would it still work correctly? When the key equals NO_PRIOR_TEMPORAL_ACCESS_RESET, the user asked to reset the filter, so we could probably find a way to make it act on Size() so vtkForEach would know to reset?

I am not sure I fully understand the question. From my understanding, without time step the NumberOfTimeSteps will be set to 1, and thus this range will have a pass-through effect. I do not know the reset key you are mentioning but I agree in the idea that the vtkTimeRange should behave correctly even in catalyst.

Yohann_Bearzi · October 9, 2023, 1:50pm

I am not sure I fully understand the question. From my understanding, without time step the NumberOfTimeSteps will be set to 1, and thus this range will have a pass-through effect. I do not know the reset key you are mentioning but I agree in the idea that the vtkTimeRange should behave correctly even in catalyst.

If NumberOfTimeSteps equals 1, then vtkForEach::IsIterating would return false after the first time step, right?

You often don’t know the number of time steps you are going to produce a priori in situ. When this is the case, TIME_STEPS is not defined at all. It is not a big deal for most filters, but it seems it would impact vtkForEach with vtkTimeRange if I am not mistaken.

Charles_Gueunet · October 9, 2023, 1:58pm

If NumberOfTimeSteps equals 1, then vtkForEach::IsIterating would return false after the first time step, right?

Yes, which simply means that the sub-pipeline won’t re-execute (no CONTINUE_EXECUTING set). However, if the input changes because the simulation updates, everything will run again smoothly.

In this case, I would say we do NOT want the vtkTimeRange to try to loop over the simulation time. In my understanding, here, the vtkForEach simply pass-through as any non-temporal pipeline.

Yohann_Bearzi · October 9, 2023, 2:15pm

Oh I see, I didn’t see it like that. You are right about the looping mechanism, we wouldn’t want to change Size() etc.

However, TimeValues would always be { 0 }, hence we would always request the same UPDATE_TIME_STEP in RequestUpdateExtent, and RequestData would not be called more than once? Would we need to iterate a counter in our in situ instance to force a pipeline update at each call to RequestUpdateExtent?

hollowsunhc · February 1, 2024, 4:01pm

What an excellent idea!

Is it possible to implement the filter within a pipeline without requiring extensive modifications to the pipeline browser?

An idea of execution range I have:

A list of path queries?