it takes long time for the iso-surface extraction at first step

For my project, the tightly coupled in-situ analytics process the data generated by the simulation at each iteration. The analytics code is just the iso surface extraction based on VTK. I found that the first step of the data processing always needs long time for some processes (more then 100 seconds), however, most of processes finish less than 1 seconds. The subsequent iterations for in-situ data processing looks normal (1-2 seconds).

At the beginning, I thought it might be the issue of the data content at the step 0, but when I let the in-situ process only start from step 2, the same thing appears (some process needs long time to finish at the step 2) then the subsequent steps looks good.

Do you have more ideas about this? If it needs some time for VTK to initialize things, why only some of the processes take long time but other processes finish quickly (1-2 seconds)? this is the code I used to do the in-situ processing (Gorilla/InSitu.cpp at master · wangzhezhe/Gorilla · GitHub), It is a standard VTK code.

This is the log message during the workflow execution, you could see the process 16 takes more than 800 seconds and other processes finish quickly.

11: ana substep 1 0.221448 substep 2 0.0378616 iteration 0
13: ana substep 1 0.242323 substep 2 0.0486019 iteration 0
10: ana substep 1 0.870095 substep 2 0.376127 iteration 0
12: ana substep 1 0.880432 substep 2 0.39582 iteration 0
18: ana substep 1 0.899096 substep 2 0.393423 iteration 0
20: ana substep 1 0.905801 substep 2 0.40446 iteration 0
15: ana substep 1 0.918896 substep 2 0.39192 iteration 0
 0: ana substep 1 0.9006 substep 2 0.414104 iteration 0
 7: ana substep 1 0.930318 substep 2 0.388144 iteration 0
14: ana substep 1 0.924077 substep 2 0.39481 iteration 0
 6: ana substep 1 0.934809 substep 2 0.390208 iteration 0
 5: ana substep 1 0.982448 substep 2 0.429923 iteration 0
17: ana substep 1 1.00999 substep 2 0.470984 iteration 0
29: ana substep 1 1.03631 substep 2 0.464399 iteration 0
23: ana substep 1 1.03975 substep 2 0.46185 iteration 0
21: ana substep 1 1.16262 substep 2 0.411943 iteration 0
19: ana substep 1 1.17479 substep 2 0.414073 iteration 0
 9: ana substep 1 2.1472 substep 2 0.698425 iteration 0
 8: ana substep 1 2.15893 substep 2 0.715997 iteration 0
 1: ana substep 1 2.15701 substep 2 0.724055 iteration 0
 3: ana substep 1 2.15274 substep 2 0.7321 iteration 0
 2: ana substep 1 2.16376 substep 2 0.73095 iteration 0
 4: ana substep 1 2.16626 substep 2 0.728455 iteration 0
27: ana substep 1 3.3584 substep 2 0.506065 iteration 0
25: ana substep 1 3.40808 substep 2 0.492352 iteration 0
26: ana substep 1 14.4346 substep 2 0.606578 iteration 0
28: ana substep 1 15.2043 substep 2 0.61382 iteration 0
30: ana substep 1 35.9798 substep 2 0.545443 iteration 0
24: ana substep 1 101.899 substep 2 0.576698 iteration 0
22: ana substep 1 192.677 substep 2 0.725628 iteration 0
31: ana substep 1 722.853 substep 2 0.706089 iteration 0
16: ana substep 1 867.46 substep 2 0.483828 iteration 0

It’s hard to say without knowing more about the code. Could it be related to I/O ? What isocontour algorithm are you running?

Thanks so much for the reply,

This is the isocontour function that I use:

it standard one that uses vtkMarchingCubes

This is place where it is called:

Are there some extra operations (that relates with I/O) for the first call when we use the VTK libraries ?

I recommend that you use vtkFlyingEdges3D for isocontouring, it is 4-20x faster then MC (especially if you build with threading, i.e, set VTK_SMP_IMPLEMENTATION_TYPE to TBB - but of course you may need to install TBB).

VTK’s pipelines are lazy evaluated - nothing executes until triggered by a Render() or Update(). It could be that on the first step the pipeline (or portions of the pipeline) are executed, this may include reading / importing data (i.e., I/O), running filters, etc. Typically when you get to large data, often I/O becomes the bottleneck.

2 Likes

Thanks a lot for the information!

I tried the vtkFlyingEdges3D just now, and it definitely shows better performance, and the time consumed by different processes looks normal now. Although there are several discussions about these two methods, I’m still curious about the “magics” behind this? (I even did not use the multi threads build).

Thanks!

I cannot say for sure what the “magics” are without digging into your example. One thing about vtkMarchingCubes is that it performs point merging (using a point locator) to merge coincident points, while flying edges does not. Point locators (and other locators) can be problematic (i.e., slow) if they are not configured correctly - this is usually done automatically but sometimes due to data quirks it doesn’t work right. Also, marching cubes is inefficient in that it visits voxel edges multiple times (and performs edge intersection operations multiple times) and also visits cell voxels multiple times - meaning excessive loading of data. Flying edges was implemented to avoid these multiple visits, so even in serial execution, the algorithm is much faster. If you enable threading, flying edges flies :slight_smile:

BTW building with threading (especially TBB) is highly recommended. The community has put enormous effort into accelerating algorithms in the last couple of years, and the build process with CMake is simple. It is common to see speedups of > 10x, even on computer with a modest number of threads.

2 Likes

This is quite informative! Thanks!