Preventing duplication of algorithm output data for timeseries, using vtkDataObject::DATA_TIME_STEP

kvankooten · May 20, 2019, 5:06pm

I’ve come across an issue which has cropped up a number of times in cases where I had to make a conversion between in-memory VTK graphics class instances and an external graphics format that mirrors those VTK class instances. For such a conversion, I generally use vtkDataObject::DATA_TIME_STEP to derive the timestep for which a certain algorithm output (such as vtkPolyData) was generated. In the context of ParaView: if an animation timeline is played on a scene containing a static box and a timeseries dataset, the box will remain at 0 for DATA_TIME_STEP, while the timeseries data will reflect the timeline value. This is useful for reasons of retaining converted data per timestep, and not having to redo this conversion at every timestep change. It also works for caching in PV, as witnessed by vtkPVCacheKeeper.

There is one issue with this: generally algorithms like file series readers produce different data output only at whole time values, as stored in vtkStreamingDemandDrivenPipeline::TIME_STEPS. However, they could be asked to reproduce a timestep for any fractional time value, where they just clamp to an integer and choose that particular version of their data. But only the fractional value is typically passed down the pipeline in DATA_TIME_STEP - the actual version of the data produced is lost, and possibly the same source data is copied into a new output data instance multiple times for different time steps. Therefore, it is to my knowledge not possible to derive when the source, and therefore the data output content has actually changed. One simply has to assume that output data changes every time DATA_TIME_STEP does. This is also evidenced in PV by the CacheKeeper, which will have one entry for every fractional animation time step along the timeline, even if the source is a timeseries data with a limited number of TIME_STEPS.

I was wondering whether it would be possible, to have information on top of the DATA_TIME_STEP about what version of the source data has been chosen for generation of an algorithm’s output. If an algorithm changes what is passed down to it at every fractional time step, as is the case with a temporal interpolator, this version number can just mirror the DATA_TIME_STEP, but otherwise it stays intact and can yield more precise information for converters and caches. Another option would be to have some form of cache on the side of a data reader, to prevent duplication of the output data at the source.

I’m not necessarily looking for a quick fix, just wanted to kick off a discussion to see where it leads.