vtkAbstractArray - InsertTuple() and InsertValue()

drpeterfranz · November 10, 2021, 6:50pm

I have a problem where I need to insert additional tuples (and/or values) in subclasses of vtkAbstractArray, i.e. vtkDataArray or vtkStringArray etc.

The data should be truly inserted, i.e. if I insert “x” into (0,1,2,3,4,5) it should become (0,1,2,x,3,4,5). So the data structure needs to be enlarged, the data beyond the insertion point copied to a new memory location, and finally “x” needs to be inserted.

I have had a look at the currently implemented methods for InsertTuple() and InsertValue(). These methods resize the array only if the insertion point is beyond the current size, and then simply set “x” into the array. So the array may look something like (0,1,2,x,4,5), or (0,1,2,3,4,5,x) if the insertion was at the end - this is quite counter-intuitive in my opinion, but let’s not go there.

What is the best way of achieving my goal for a true insertion, for both vtkDataArray and vtkStringArray?

Yohann_Bearzi · November 10, 2021, 7:17pm

Let me start by emitting a little warning: doing this kind of operation is very unefficient, even with a std::vector. When doing that, all the memory starting at the position of x needs to be written over. So if you do it just a few times, you should be good performance-wise. If not, I’d advise trying to find another route.

Now, the best way to do that would actually be to do it by hand. You can leverage all “common types” data arrays (vtkIntArray etc.) in one functor using vtkArrayDispatch. You would have to write a function especially fitted to vtkStringArray.

drpeterfranz · November 10, 2021, 8:30pm

I know that this operation is potentially inefficient; a solution could be to allow insertion of multiple values at the same time so that the memory copy operation needs to be done only once.

I’ll probably have to write an implementation for this. Any suggestions for a good name, since InsertTuple/InsertValue is already taken?

Yohann_Bearzi · November 10, 2021, 8:39pm

Do you want to write an implementation directly in vtkDataArray? I think the better way to handle that at your level would be to have some local utility class that you could name however you want, and have a few static methods inside it that you can call from your code base. I’m not sure we would want a std-like insert method in VTK. Tempting the user to use not-so-recommended methods by making it available is often not a good idea, because people might use it a bit too often. We already have a bunch of not-so-recommended methods that we want to get rid of

You don’t need to have access to any private or protected member in order to perform what you want to perform so you can totally have it live anywhere.

drpeterfranz · November 10, 2021, 8:46pm

Well I am working on an extension of vtkTable (which I plan to merge with the main VTK repository), and I need the functionality there because I need to insert rows into the table. So if I implement something there then I might as well do a better job and do it in vtkDataArray directly to avoid duplication. I also think this is an operation which may be useful for others as well.

I had a look at vtkGenericDataArray::RemoveTuple() and it actually performs a copy operation when removing a tuple. On the other hand vtkBitArray::RemoveTuple() has an error macro which says it’s not implemented yet.

So I could take the RemoveTuple() methods as templates and write similar “InsertTuple” methods there?

Yohann_Bearzi · November 10, 2021, 8:58pm

Oh I start to understand your problem better…

As a first proposition, maybe we could call this API InjectTuple / InjectValue. Maybe you can start with that and we can still use sed to rename it later when you merge during review if someone has a better name.

drpeterfranz · November 10, 2021, 9:00pm

Thanks. Yes InjectTuple/InjectValue sounds like a good name.

Yohann_Bearzi · November 10, 2021, 9:02pm

Don’t forget to add a warning in the docs saying that the complexity of one injection is O(n) and that it should be avoided when possible for the sake of performance.

Yohann_Bearzi · November 11, 2021, 1:05am

While I think about it, please also add an InjectTypedTuple in relevant classes (vtkAOSDataArrayTemplate for instance), for completeness.

drpeterfranz · November 11, 2021, 1:11am

I have so far implemented a generic method in vtkGenericDataArray. It’s core to shift the data is

int numComps = this->GetNumberOfComponents();
vtkIdType toTuple = this->GetNumberOfTuples() - 1;
vtkIdType fromTuple = dstStart + n - 1;
for (; fromTuple >= dstStart; --toTuple, --fromTuple)
{
for (int comp = 0; comp < numComps; ++comp)
{
this->SetTypedComponent(toTuple, comp, this->GetTypedComponent(fromTuple, comp));
}
}

Is this sufficient for dealing with vtkAOSDataArrayTemplate?

Yohann_Bearzi · November 11, 2021, 1:22am

vtkAOSDataArrayTemplate could have a fast path as memory is continuous in such arrays. You can use a std::copy on the internal raw pointer on the memory you shift, and then inject the input tuple (using std::copy as well).

toddy · November 11, 2021, 1:30am

Do you really need to inject tuples into the vtkArray? Why can’t they be appended and then have your table class insert the rows in a customised order by using a second list which maps each row to a tuple index?

drpeterfranz · November 11, 2021, 3:22am

Why make it complicated? vtkAbstractArray already contains interfaces to “inser” and remove tuples/values, so having an injection method there seems like a natural choice to me.

will.schroeder · November 11, 2021, 10:25am

Some comments:

As has been pointed out, “injection” of tuples is problematic, it’s heading down the path of a O(n^2) algorithm. I agree if it’s done infrequently, or the data is small, it might be okay. However, I have found that this sort of algorithmic complexity is an indicator that the data structure should be reconsidered. For example, it’s often better to insert in no particular order data (a tuple with a sort parameter) and then sort at the end. Given the performance of sort, its O(n long(n)) worse-case behavior), and threading support, e.g. vtkSMPTools::For()), sorting can be quite fast. Or using some sort of map, or indexing scheme has Todd hinted at. So supporting a very inefficient method in vtkDataArrays - I’d prefer to discourage it - I can easily imagine a thread on the discourse list five years from now when some user abuses injection and complains about performance etc.
If we didn’t have 28 years of VTK backward compatibility to contend with, I would drastically rework the vtkDataArray API. Mostly, make it much simpler, thread safe, and target integration with C++ stdlib containers, algorithms, etc. Expanding the VTK API (i.e., making it more complex) at this point has to be examined carefully. Rather, given the emerging expressiveness and performance of C++ standard library, I think a better approach is to use C++ “functions” and then interface to the VTK data arrays (using e.g., SetTuple() or SetData() etc).
“Why make it complicated?” is a good question but we know that every line of code has a cost, particularly when it’s pervasive across the entire VTK system for a feature that over 28 years has not been needed. That complexity is not just writing the code, but maintaining, documenting, and testing it , and possibly creating additional API confusion, and increasing object and executable sizes, etc.

drpeterfranz · November 11, 2021, 6:17pm

Ad 1, I can’t see that it’s O(n^2). Per insertion n tuples need to be shifted in memory. An insertion can specify to be m tuples long. So this should be O(n)
Ad 2, I agree that the API has been long standing - that’s why I wouldn’t want to change anything in “InsertTuple()”, which is by my opinion the method which should actually achieve this operation. And now that the API is standing we should work to extend it in such a way which is consistent with it rather than rewriting it. “InjectTuples()” would be a true extension since it does not change existing code.
Ad 3, I agree, and that’s exactly why I think “InjectTuples()” should live in vtkAbstractArray. We can write nice tests for it, and then other methods can make use of it. In my example if I would handle the injection in vtkTable that will work, but then we’d have to maintain in there. And if another developer/user requires injection they need to re-write it in their part of the code. That’s how things spiral out of control and cost more in maintenance in the long run.

I’ll go with whatever you decide here, so please let me know how you’d like me to solve this.

Yohann_Bearzi · November 11, 2021, 6:35pm

In your case, since you’re working with vtkTable, it would actually be cheaper complexity-wise to use an id array like Todd proposed. You can put those ids inside a vtkIdList* and then call vtkAbstractArray::InsertTuples(vtkIdType destStart, vtkIdList* srcIds, vtkAbstractArray* source) using those ids (this method specialized for this case is not yet in VTK but I actually have an MR that adds it as I’ve encountered maybe 10 instances where this would have made my life easier).

Let’s say your table is of dimensions n x m. Adding a row is equivalent with updating the vtkIdList (by hand I’m afraid as we cannot inject ids inside it), which is of complexity n for each injection. If you want to add k rows, you would do k * n operations updating the ids, and then n * m operations updating the table, which results in k * n + n * m operations. If you were updating all tables for each row adding, you would instead have k * n * m operations. So you’re better off interfacing with ids.

And even better news, if you use vtkDataSetAttributes::CopyData using those ids, this method will soon be multi-threaded (I have an MR for that).

So what I would propose is to use a vtkIdList to do your operations, play with ids, append the values in some source vtkDataSetAttributes, and then flush everything in the vtkTable when everything is ready.

drpeterfranz · November 11, 2021, 9:01pm

Not sure if I understand all details in your comment @Yohann_Bearzi .

@toddy 's suggestion was to keep a secondary index, i.e. appending the new data at the end of the individual column arrays, then keeping an id list like (0,1,2,5,6,3,4) to map the row indices to the data in vtkTable. This would solve insertion, but not removal of rows (the deleted rows would be stranded data occupying memory). Further, vtkPlot which requires vtkTable makes direct use of the individual column data, and hence it needs to be ordered correctly and not contain stranded deleted rows, for example when using vtkPlotLines. This excludes the use of an id list to map the row indices.

drpeterfranz · November 11, 2021, 9:09pm

Also as stated before, RemoveTuple() already contains all the functionality to shift tuples in memory, albeit by shifting it only by one index. Instead of creating an InjectTuples() method we could make a protected ShiftTuples() method (or MoveTuples()), then move the code from RemoveTuple() in there and let RemoveTuple() call ShiftTuples().

To demonstrate my point, here’s the code excerpt from vtkGenericDataArray::RemoveTuple():

As you see this already contains the proposed tuple-shifting mechanism, albeit by currently only allowing to shift by one index. So currently if I want to remove k tuples I have to call RemoveTuple() k times, i.e. we have k * n operations. Instead we could add in the ShiftTuple(k) method and create InjectTuples(k) and maybe even RemoveTuples(k). This reduces k * n operations down to n operations.

Yohann_Bearzi · November 11, 2021, 9:29pm

The columns in your vtkTable are represented by a vtkDataSetAttributes. You cannot re-index on the fly elements in the vtkDataSetAttributes. What you can do is do a book-keeping on what rows you add in the id list while you edit the vtkTable. It could be part of the input of a filter editing the vtkTable, or of some modifier method directly in vtkTable. And, using this list of ids, at the end of the day, you need to rearrange the vtkDataSetAttributes so when someone reads it, it reads the correct table.

What I was describing is how the complexity of book-keeping before modifying the actual table is lower than the complexity of editing the entire table for each row you inject.

Regarding your point on RemoveTuple, removing an index in your book-keeping performs the same thing. When you call vtkAbstractArray::InsertTuples using your list of ids, indices that are not present will not be copied in the new array. Using a list of ids actually allows you to remove tuples that are not necessarily adjacent, so it is more general than shifting tuples.

drpeterfranz · November 11, 2021, 10:08pm

Alright, the discussion is spiralling beyond what I wanted to achieve. So to be pragmatic I’ll return to my original goal and modify vtkTable accordingly and let it handle the desired insertion/removal of rows, using the currently used methods in there.

After I have merged that in later on we can maybe have another look at this. It’ll still be easy then to decide where the insertion/removal should be handled best.

Good discussion still, thanks to all involved.