Make nanoflann a third party used for point locators

Yohann_Bearzi · September 25, 2020, 4:23pm

nanoflann, forked from flann (Fast Library for Approximate Nearest Neighbors), is a very powerful and fast point search library that I’ve been using for years on point clouds. There are a lot of different indexing schemes (see page 9 of the manual).

The main difference between flann and nanoflann is that flann has run-time overheads that are squeezed out by nanoflann, which is header only.

I think it would be worth depending on this library for point locators, replacing vtkKdTreePointLocator guts by nanoflann, and potentially implement new locators with other indexing strategies directly available through nanoflann. The metric used to build the index is also customizable through a functor, which could be a cool feature to add to VTK’s locators.

Any filter that does a lot of point searches would benefit from this change. For instance, the new filter vtkOverlappingCellsDetector would benefit from it.

nanoflann works with CUDA. is thread safe, and I’ve seen an mpi directory in the flann git repository, without looking too much into it.

Any opinions?

utkarshayachit · September 25, 2020, 4:47pm

How hard is to put together a quick performance comparison for build/lookup using vtkKdTreePointLocator and nanoflann? If there’s a dramatic performance improvement, it’d be a very easy case to make.

ben.boeckel · September 25, 2020, 5:10pm

How are we going to manage training? Models?

Yohann_Bearzi · September 25, 2020, 5:14pm

I’m not sure to follow what you mean by that. Ideally the user wouldn’t be aware of the internal mechanics of it. All you want to do is to feed points, and query neighborhoods.

ben.boeckel · September 25, 2020, 5:40pm

Yes, but there’s some pre-trained network behind the scenes right? How will we manage it? The training materials should also be in the source repo too.

Yohann_Bearzi · September 25, 2020, 5:41pm

I guess it wouldn’t be too hard to do. Just have to build a big point cloud and compare performances.

Some people have already looked into this kind of benchmarks (not with VTK’s version, which does not seem to be commonly used).

As far as I have looked, regarding kdtrees, when used on 2D / 3D points and when the tree is built staticly, you won’t find better than nanoflann. There seems to be a debate on what is better between flann / nanoflann, or other libraries for more exotic search spaces.

Yohann_Bearzi · September 25, 2020, 5:44pm

I don’t think there are pre-trained networks behind the scenes. A kd-tree does not need training to learn how to build itself. The only limitation of this kind of structure is that if you have a lot of points that are aligned in the same plane in 3D / line in 2D, you lose a lot of efficiency because the space gets weirdly split. The user should know what kind of structure it wants depending on its input data.

Yohann_Bearzi · September 25, 2020, 5:47pm

Oh did you mix flann (Fast Library for Approximate Nearest Neighbors) with flann ( Functional Link Artificial Neural Network)?

ben.boeckel · September 25, 2020, 6:10pm

Ah, yes I did. Normal third party import details apply. Mangling, import, etc.

Yohann_Bearzi · October 1, 2020, 8:19pm

I did a rapid test. To do so, I tweaked a little this example from nanoflann.

I tried on both nanoflann and VTK (Release build) to build a kdtree with one million random points, and then did one million random queries on the 10 nearest neighbors. I used rand() for the random number generator, with same 3D extent.

Results are:

VTK - kdtree build time is 134ms, search is 7,319ms
nanoflann - kdtree build time is 278ms, search is 554ms

Yohann_Bearzi · October 1, 2020, 8:22pm

On a debug build, the search with VTK takes 54 seconds. I don’t know if we can keep the same performance of nanoflann with a VTK debug build if it becomes a third party. Can we @ben.boeckel?

ben.boeckel · October 26, 2020, 10:26am

I don’t like the idea of compiling some bits in release and others in debug if one selects a Debug build mode. One is (generally) already not in the mood to figure out hidden CMake flags to get missing debug info when one is looking for it.

Joachim_P · November 21, 2020, 11:35am

It would be great to have such a VTK class that would rely on nanoflann for efficient point locator requests but before having it integrated in VTK I would publish it as a remote module so interested people could experiment with it first (and it could easily replace the vtkKdTreePointLocator through the factory mechanism).

Yohann_Bearzi · March 1, 2022, 6:57pm

I just ran some more tests.

I have a VTK branch named nano-flann-test with a test called TestKdTreePointLocator performing some benchmark on the point locator of VTK (my fork is https://kitware.gitlab.com/yohann.bearzi/vtk.git). I have another branch on my fork of nanoflann benchmarking the same test called test-kdtree-vtk. My fork is in https://github.com/yohannbearzi/nanoflann. Running examples/pointcloud_adaptor_example benchmarks the same test as in my VTK branch.

Results look as follows on my machine:

nanoflann:
Build time 0.299125 seconds for 1000000 points
Search time 0.638600 seconds for 1000000 queries and 10 neighbors

VTK:

vtkKdTreePointLocator
Build time 0.142078 seconds for 1000000 points
Search time 7.368636 seconds for 1000000 queries and 10 neighbors

vtkStaticPointLocator:
Build time 0.093148 seconds for 1000000 points
Search time 4.563173 seconds for 1000000 queries and 10 neighbors

When multithreading on 16 threads on my machine, comparing nanoflann and vtkStaticPointLocator:

nanoflann:
Build time 0.294138 seconds for 1000000 points
Search time 0.087202 seconds for 1000000 queries and 10 neighbors

VTK:
Build time 0.034972 seconds for 1000000 points
Search time 0.967511 seconds for 1000000 queries and 10 neighbors

So… It turns out that VTK locators are outperformed by nanoflann by a lot (over 10 times) on querying points, but not on actually building the locator (twice slower than vtkKdTreePointLocator). So it seems that on instances where there are not a lot of points to fetch, vtkStaticPointLocator is to be preferred. “Not a lot” in the test I did is about a as many queries as 10% of the number of points in the locator. For more queries, nanoflann is much much better. I would like to emphasize that this is true on evenly distributed points in a data set. vtkStaticPointLocator store points in bins that are formed by a grid discretization of the space. If there is a large concentration of points in one bin, the point locator becomes very slow at fetching points in this bin.

So, I propose we should definitely change vtkKdTreePointLocator backend to use nanoflann (it also uses a kdtree in one of their implementation). This locator should be preferred in most instances. Light work can stick with vtkStaticPointLocator.

I’m reaching out to the community here. Do you know of a better implementation than nanoflann for point locators? Is there any known issues that could occur with nanoflann that I’m not foreseeing? Or is using nanoflann as a backend a good idea?

will.schroeder · March 1, 2022, 7:22pm

This is a great analysis, I’d definitely replace the implementation of vtkKdTreePointLocator with nanoflann unless there’s some weird build dependencies or licensing issues. I’m curious, did you do any memory usage comparisons? I’ve found myself lately processing 0.5 billion points, and memory can be a concern.

Yohann_Bearzi · March 1, 2022, 8:07pm

I don’t think there’s an easy way to know the memory consumption in the VTK’s locators. They don’t have GetActualyMemorySize(). In nanoflann, a locator of 1 million points (the one of my example) produces an excess of 4.3 MB. So roughly 4 bytes per points.

Yohann_Bearzi · March 1, 2022, 8:09pm

Using 10 million points, nanoflann generates an excess of 38 MB, so we’re in the same scale. 4 bytes per point.

will.schroeder · March 1, 2022, 8:11pm

That’s very impressive! although I don’t think you mean 4 MB per point

StongeEtienne · March 20, 2022, 6:07pm

Hi, Sorry to join in late in the discussion (new in this discourse),
I’ve been using VTK for quite a long time and recently i’ve been using Nanoflann “heavily”.

Nanoflann should be quite easy to integrate since it’s a single “.hpp” file using common STD includes.

Also, for specific application (e.g. 2D / 3D points) the speed can be further improved,
using some templated L2_distance function (to remove the “for loop”).
And could be further improved with continuous array, enabling SSE-AVX instructions.

This is what I did in my branch (however this project is more to generalize kdtree Lpq distances) :

I would be happy to help / discuss !

will.schroeder · May 9, 2023, 7:37pm

Yohann and others-

What is the status with nanoflan? There are some potential applications where the speed could be most beneficial. It sounds like it would be relatively straightforward to integrate.