vtkGenericDataArray lookup

I was using the vtkIntArray::LookupValue on an array with repetitive values, similar to this:

import vtk

values = [ 2, 2, 3, 3, 3, 8, 7, 8, 7, 8, 7, 6, 9, 6, 9, 6, 9]

array = vtk.vtkIntArray()
for v in values:
    array.InsertNextValue(v)

for v in [ 2,3,6,7,8,9 ] :
    print(f"{v} : {array.LookupValue(v)}" )

This yields the following - to me somewhat surprising - output:

2 : 1
3 : 2
6 : 15
7 : 8
8 : 9
9 : 12

I sort of expected that lookupValue would return the first index, but clearly it doesn’t. This is due, I suppose, to vtkGenericDataArrayLookupHelper::UpdateLookup() using unstable algorithms std::partition and std::sort.

I think it may be useful to have a stable implementation of the lookup, but not if the user pays by a performance degradation. Since VTK now supports C++11, we could consider to change the implementation by using an std::unordered_map of values containing vectors of indices. This avoids the partitioning and sorting and greatly simplifies looking up values and lists of values.To see if this approach would perform well enough to justify the change I made a (simplified) index lookup.

I implemented the index lookup in three different ways:

  1. unstable: vector of index-value pairs with unstable partitioning for nan and unstable sorting;
  2. stable: vector of index-value pairs with stable partitioning for nan and stable sorting;
  3. mapped: unordered_map of vectors of indices

These are tested with a vector of 1000000 ints populated by random numbers, sorted numbers and unsorted numbers. Each of these population approaches are done with

  1. 10 unique numbers
  2. 100000 unique numbers

The tests show the timings for lookup of

  1. first: the first lookup (invoking the construction of auxiliary data structures);
  2. single: cumulative time for looking up a single index for each entry of the vector;
  3. all: cumulative time for looking up all indices for each entry of the vector;

map_of_vectors.cpp (10.6 KB).

These are the results build with g++ - O3 (GCC) 6.3.1 20170216 (Red Hat 6.3.1-3)

unstable  random 1  | first: 9.644e-02  single: 4.119e-02  all: 3.341e+02 
stable    random 1  | first: 7.332e-02  single: 3.865e-02  all: 3.329e+02 
mapped    random 1  | first: 2.063e-02  single: 1.174e-02  all: 3.201e+02 

unstable  random 2  | first: 1.051e-01  single: 2.048e-01  all: 3.323e-01 
stable    random 2  | first: 1.109e-01  single: 2.047e-01  all: 3.278e-01 
mapped    random 2  | first: 8.999e-02  single: 5.380e-02  all: 1.614e-01 

unstable  sorted 1  | first: 3.880e-02  single: 2.570e-02  all: 2.065e+02 
stable    sorted 1  | first: 5.930e-02  single: 2.630e-02  all: 2.079e+02 
mapped    sorted 1  | first: 1.340e-02  single: 1.205e-02  all: 2.071e+02 

unstable  sorted 2  | first: 3.290e-02  single: 3.362e-02  all: 5.553e-02 
stable    sorted 2  | first: 4.705e-02  single: 3.471e-02  all: 5.593e-02 
mapped    sorted 2  | first: 2.599e-02  single: 1.410e-02  all: 3.283e-02 

unstable  reverse 1 | first: 3.013e-02  single: 3.411e-02  all: 5.482e-02 
stable    reverse 1 | first: 3.904e-02  single: 3.387e-02  all: 5.505e-02 
mapped    reverse 1 | first: 2.528e-02  single: 1.449e-02  all: 3.362e-02 

unstable  reverse 2 | first: 8.932e-02  single: 2.535e-02  all: 2.077e+02 
stable    reverse 2 | first: 4.755e-02  single: 2.538e-02  all: 2.082e+02 
mapped    reverse 2 | first: 1.345e-02  single: 1.208e-02  all: 2.070e+02 

From this it appears that the mapped implementation performs better or equal in all cases.

Are there any objections against modifying the current implementation?

I think it’d be great. @utkarshayachit and @steven.hahn seem to have done most of the work on the current implementation, but if they’re cool with it, we’d love to see a patch for this!

Just published a merge request. I couldn’t find @steven.hahn in the members section on gitlab.kitware.com/vtk