VTKSMP stdthread performance

Thank you for both quickly identifying a solution and taking steps to integrate it across the codebase!

TBB is a bit complicated depending on which platform it’s on. You can see the full details here, but tracing back the relevant bits for VTK’s usage:

From Common/Core/SMP/TBB/vtkSMPThreadLocalImpl.h:
T& Local() override { return this->Internal.local(); }
where this->Internal is a tbb::enumerable_thread_specific<T>Internal;

jumping to tbb/enumerable_thread_specific.h, their local() implementation:

    reference local(bool& exists)  {
        void* ptr = this->table_lookup(exists);
        return *(T*)ptr;
    }

past this there are two options, the first, which VTK will use by default (at least with the current TBB codebase) delegates to other implementations of thread-local functions, using either win32 functions or pthreads.

    void* table_lookup( bool& exists ) {
        void* found = get_tls();
        if( found ) {
            exists=true;
        } else {
            found = super::table_lookup(exists);
            set_tls(found);
        }
        return found;
    }
#if _WIN32||_WIN64
#if __TBB_WIN8UI_SUPPORT
    using tls_key_t = DWORD;
    void set_tls(void * value) { FlsSetValue(my_key, (LPVOID)value); }
    void* get_tls() { return (void *)FlsGetValue(my_key); }
#else
    using tls_key_t = DWORD;
    void set_tls(void * value) { TlsSetValue(my_key, (LPVOID)value); }
    void* get_tls() { return (void *)TlsGetValue(my_key); }
#endif
#else
    using tls_key_t = pthread_key_t;
    void set_tls( void * value ) const { pthread_setspecific(my_key, value); }
    void* get_tls() const { return pthread_getspecific(my_key); }
#endif

It’s hard to tell exactly what the win32 APIs are actually doing, but their docs on TLSGetValue() say “TlsGetValue was implemented with speed as the primary goal” - so it probably doesn’t lock anything. Not sure what VTK’s stance on using direct OS calls (or win32 specifically), but perhaps doing something similar is an option?

They do have a non-delegated backend which is similar to what VTK is doing but uses std::thread::id directly for table lookups -

table_lookup() is implemented as

void* ets_base<ETS_key_type>::table_lookup( bool& exists ) {
    const key_type k = ets_key_selector<ETS_key_type>::current_key();

    __TBB_ASSERT(k != key_type(), nullptr);
    void* found;
    std::size_t h = std::hash<key_type>{}(k);
    ...  

and the current_key() is derived from

template <std::size_t ThreadIDSize>
struct internal_ets_key_selector {
    using key_type = std::thread::id;
    static key_type current_key() {
        return std::this_thread::get_id();
    }
};