I cannot say for sure what the “magics” are without digging into your example. One thing about vtkMarchingCubes is that it performs point merging (using a point locator) to merge coincident points, while flying edges does not. Point locators (and other locators) can be problematic (i.e., slow) if they are not configured correctly - this is usually done automatically but sometimes due to data quirks it doesn’t work right. Also, marching cubes is inefficient in that it visits voxel edges multiple times (and performs edge intersection operations multiple times) and also visits cell voxels multiple times - meaning excessive loading of data. Flying edges was implemented to avoid these multiple visits, so even in serial execution, the algorithm is much faster. If you enable threading, flying edges flies
BTW building with threading (especially TBB) is highly recommended. The community has put enormous effort into accelerating algorithms in the last couple of years, and the build process with CMake is simple. It is common to see speedups of > 10x, even on computer with a modest number of threads.