Discussion on vtkSmpTools improvement

Timothee_Couble · March 24, 2021, 9:35am

Greetings,

As part of an internship at Kitware, I’m going to work on improving vtk SMP tools. And we would like to engage the broader VTK community for ideas and feedback.

The main goal of this work is to support VTK-m as a new backend of the SMP tools.
Some intermediate tasks will be to integrate more utility functions (inspired by VTK-m algorithms) such as Transform and Fill in vtkSmpTools.

As a potential thing to do we could also refactor SMP tools current architecture to support multiple backends at compile and select one at runtime.
Other non-exhaustive features could be: the support of lambdas, an environment variable to control the number of threads…

We are open to any suggestions you may have.
Let me know what your thoughts are about this, and if you have any questions.

Thanks for your attention,
Timothée Couble

mwestphal · March 24, 2021, 9:57am

I think this is a great idea and it will let developpers use VTK-m much more simply, even if not to it’s full capacity.

Timothee_Couble · March 29, 2021, 12:10pm

Hi all,

There is a pending MR for changing the (maximum) number of threads at runtime, using a VTK_SMP_MAX_THREADS environment variable.

https://gitlab.kitware.com/vtk/vtk/-/merge_requests/7794

Please let me know if you have any feedback about this changes.

will.schroeder · March 29, 2021, 1:28pm

Great work, customers have asked for this feature, thanks

toddy · March 30, 2021, 6:30am

Why is this needed when the Initialize method passes the number of threads as a parameter?

mwestphal · March 30, 2021, 7:30am

Because this is a control that is important to expose at high-level but modifying every single filter using vtkSMPTools with a dedicated parameter would not be practical.

seanm · March 30, 2021, 5:44pm

Timothée,

Where I work we’ve recently looked/tried to start using TBB with VTK. We did measure performance improvements using it, but recently gave up for a few reasons, which may interest you.

The newest TBB uses CMake and is thus easier to work with, but…
VTK doesn’t support the newest TBB https://gitlab.kitware.com/vtk/vtk/-/issues/18107
TBB doesn’t play nice with Thread Sanitizer (TSan) https://github.com/oneapi-src/oneTBB/issues/358 It’s quite valuable to us that our whole test suite passes with TSan, so introducing something that does not is disagreeable.

Given these issues, I’ve wondered about the feasibility of a GCD (Grand Central Dispatch) backend to vtkSMPTools. See: https://en.wikipedia.org/wiki/Grand_Central_Dispatch

Sean

dgobbi · March 30, 2021, 7:08pm

This is a great point. When vtkSMPTools was first introduced, it included a backend based on vtkMultiThreader. It was a “batteries included” sort of backend, since it didn’t add any extra dependencies that might block its use in certain applications. Are C++11 threads mature enough that they could be used to create a similar batteries-included backend? I’m not asking anyone at Kitware to actually do the implementation, I’m just wondering if people think it’s feasible and whether it might be accepted into the VTK codebase.

I know it wouldn’t provide the same performance as TBB or other backends, but it would be close, and there is tremendous value in having something that can be part of the default VTK build. For example, people who grab VTK from PyPi don’t get any SMP at all (except for the vtkMultiThreader stuff in the imaging pipeline). There are huge parts of the VTK community who aren’t getting any benefit from the current SMP back ends. And I say this as a member of the community, who contributes to VTK for the benefit of the community.

toddy · March 31, 2021, 3:33am

I can only find one place in VTK where vtkSMPTools::Initialize() is actually called.

mwestphal · March 31, 2021, 7:19am

Indeed, because Initialize is automatically called when using vtkSMPTools::For.

Edit: Looks like I was wrong, I’ll let @Charles_Gueunet and @Timothee_Couble answer here.

Timothee_Couble · March 31, 2021, 1:26pm

The vtkSMPTools::Initialize() was not called before vtkSMPTools::For but after some discussion with @mwestphal and @Charles_Gueunet, we change this behavior in the above MR (7794)

toddy · March 31, 2021, 9:46pm

So now, if a developer calls vtkSMPTools::Initialize(x) in their application, your changes are going to override/alter their specified thread count.

will.schroeder · April 1, 2021, 9:08am

Good point, it would be good if VTK would defer to the application if necessary. Can we add a vtkSMPTools::IsInitialized() method to check if initialization has already occurred? I suppose it depends on the backend, evidently in TBB tbb::this_task_arena::current_thread_index() function returns tbb::task_arena::not_initialized if the thread has not yet initialized the task scheduler.

Timothee_Couble · April 1, 2021, 9:36am

Yes indeed, we removed this behavior in the MR.

Multiple solution are still being discussed internally:

one of them would be to have the possibility to create a local instance (of vtkSMPTools or of a vtkSMPToolsLocalParameters class) that would drive the number of thread locally.
or we could also consider that Initialize should be called only once at the beginning of the pipeline.

The vtkSMPTools::IsInitialized would be especially useful for this second proposition.

dcthomp · May 11, 2021, 4:20am

Hi all,

I’ve been using vtkSMPTools a bit lately and have some comments. They may be tangential to what you are aiming to do, but here they are anyway:

Sometimes, I want to pass numbers other than vtkIdType to vtkSMPTools::For(). For example, I have a sparse set or map whose keys (perhaps hashes for unordered sets/maps) are uint64_t and want to pass a range of keys to consider. Casting to vtkIdType won’t work since vtkSMPTools::For() does math on its arguments.
Sometimes I want to invoke a “for” loop on every thread without providing work. Is the correct thing to do to call vtkSMPTools::For(0, vtkSMPTools::GetEstimatedNumberOfThreads(), functor)? If so, that should be documented. If not, can we add a new way to invoke a worklet? An example use-case: I have a functor that does flood-filling and want threads to steal seeds from and insert seeds into a single queue until it is empty.
For the use case above, it would also be nice if there was a way for a thread to ask for its rank and the pool size (similar to MPI_Comm_rank and MPI_Comm_size). This would allow worklets to do fancier things like lockless writes to a shared queue.

[edit: removed a comment because I confused vtkSMPTools::Initialize() with the functor’s Initialize() method in the discussion above.]

Yohann_Bearzi · May 12, 2021, 2:28pm

Hello everyone,

I don’t know how doable it is, but I think it would be nice to be able to use SMP tools on iterators / ranges. This is especially useful for containers that are not indexed like arrays (std::map or std::set for instance).

Here’s an example on how it could work with iterators in the worker:

template<class IteratorT>
void operator()(IteratorT begin, IteratorT end) 
{
  for(IteratorT it = begin; it != end; ++it)
  {
    // Do stuff
  }
}

This could be called using vtkSMPTools::For(container.begin(), container.end(), worker);

With ranges (maybe more optional, as iterators could do the same thing). Note that the range would likely be a proxy range that internally uses iterators from the input container:

template<class RangeT>
void operator()(RangeT& range) 
{
  for(auto& element : range)
  {
    // Do stuff
  }
}

This could be called using vtkSMPTools::For(container, worker);

dcthomp · May 21, 2021, 3:37pm

Iterators should work on containers without random-access iterators as long as they have an O(1) container::size() method and an efficient std::advance() specialization (which is certainly possible with red-black trees used for set and map – though implementations may vary).