Python wrapping is slow on Windows

lassoan · May 25, 2021, 5:34pm

We use VTK’s Python wrapper for VTK classes in 3D Slicer and we find that on Windows it can take several minutes to generate the hierarchy files for each larger library. If we change a header file at lower level, hierarchy generation can add 10 minutes to the build.

We have found that it is due to stat() function is used to check if a header file can be found in an include directory - and at application level (where a library may depend on VTK, ITK, and a number of other libraries) there are hundreds of header files and hundreds of include folders.

We tested that by creating a list of all the files in all the include folders and checking the existence of a file by looking up paths in this list we can make vtkWrapHierarchy about 3-5x faster. See implementation here.

@dgobbi Would it be possible to add such caching in vtkWrapHierarchy on Windows? (or maybe on other plaforms, too, controllable with an environment variable or command-line option)

dgobbi · May 25, 2021, 7:06pm

Yes, something like this could be added. It makes a lot of sense, and it was done with the hash functions already present in the wrappers, it could be made very lightweight. I can’t give a timeline, though. As usual, I’m way behind on my VTK to-do list and my day job keeps getting in the way

lassoan · May 25, 2021, 7:10pm

Thank you!

I can help with some tasks. For example, if you tell how to use the hash functions in the wrapper then I can give it a try to use that instead of the third-party string set implementation.

ben.boeckel · May 25, 2021, 7:20pm

Thanks, this looks good overall. I made some comments about more ways to squeeze performance out of it (mainly strcat → snprintf, some strlen refactoring, and where strspn is likely way better).

lassoan · May 25, 2021, 8:52pm

@dgobbi One more related question: Currently, output *Hierarchy.txt file modified time is only updated if the file content is changed. I guess that this is implemented like that to avoid unnecessary rebuilds. However, this actually enforces regeneration of hierarchy every time, if VTK is updated after hierarchy files were generated (because then vtkWrapHierarchy executable date is updated, so hierarchy files will always have an older modification date, see some more details here).

Shouldn’t vtkWrapHierarchy always update the file modification time to indicate that the output file is up-to-date? Or should we implement some alternative mechanism to track if the hierarchy file is up-to-date (e.g., generate some timestamp files whenever a hierarchy file generation is requested and check the modified timestamp of that file instead of the hierarchy file)?

dgobbi · May 25, 2021, 9:16pm

If the hierarchy file timestamp is updated, it triggers everything that consumes the hierarchy file, which causes the wrappers to be rebuilt entirely. We had a separate timestamp file previously (up until VTK 8, I think?) but it was removed to simplify the rules. I’m not eager to add the timestamp file back into the mix.

Let’s focus on speeding up vtkWrapHierarchy first. If we can get it to run as fast on Windows as it does on Linux, having it re-run might not be as much of an issue.

lassoan · May 25, 2021, 9:39pm

With vtkWrapHierarchy speed improvements, a Slicer build after no file modifications still takes 2-3 minutes (down from 10 minutes) if VTK was updated and the hierarchy files are not deleted manually. If the timestamps are correct then the build takes about 30 seconds.

We need to address both issues (one is for achieving 5x speedup when we actually modify header files, the other is for achieving 25x speedup when header files are not modified).

We don’t rebuild VTK each time we build the application and use vtkAddon macros for Python wrapping, so it is fine if VTK’s macros will not use timestamps. We’ll add them to vtkAddon.

ben.boeckel · May 25, 2021, 9:46pm

Maybe only updating the generated source file if the content changes is better? Using the hierarchy files would then be done only as needed, but actually generating the source is generally pretty cheap (compared to compilation).

dgobbi · May 25, 2021, 10:55pm

Maybe. But one of the reasons for doing it with vtkWrapHierarchy was that it was trivial to implement it there (since vtkWrapHierarchy kept an internal buffer of the heirarchy info for sorting purposes before writing the buffer to the file at the very end). For comparison, vtkWrapPython produces its output via hundreds of fprintf() statements. So to avoid updating the timestamp, vtkWrapPython would have to write to a temporary file that only replaced the source file if they differed. If vtkWrapPython crashed or was killed, would the build system automatically remove the temporary?

dgobbi · May 25, 2021, 10:57pm

For reference, !3694 shows how the timestamps used to be implemented.

lassoan · May 25, 2021, 11:55pm

I’m not exactly sure which source file do you mean, but while you are waiting for your code to compile every second of unnecessary waiting counts, so I would much prefer implementing one an extra mechanism in the build process (e.g., generate and check a timestamp) if that saves 5-10 seconds for each build for each developer.

Maybe the wrapping process in VTK proper is already quite complex, but the vtkWrapHierarchy macro in vtkAddon is still quite straightforward.

Thank you! Reverting the original commit that removed the timestamp in VTK8 - reverting it has fixed the problem!

So, we can now just focus on making vtkWrapHierarchy faster.

ben.boeckel · May 26, 2021, 10:28am

No, but nothing depends on it, so it shouldn’t interfere with the build graph.

KWSys contains code to do this, but it is C++.

dgobbi · May 26, 2021, 2:55pm

The timestamp trick (leaving the generated file timstamp unmodified if file doesn’t change, and having a separate timestamp file for the build rule) makes a lot of sense for all generated files, but we can’t be 100% sure that this is robust across all generators. The timestamp files for the vtkWrapHierarchy rule were removed because it was suspected that they caused problems for highly parallel (-j 8) builds. So I vote that we delay playing with the timestamps until after the next release cycle

dgobbi · May 26, 2021, 3:10pm

Andras, I started fiddling with your code and thinking of solutions last night. I’m going to take a slightly different approach. Instead of iterating through all the include directories at the beginning, I’m making a “cached stat” function that iterates through directories only as needed. The idea is this:

Instead of calling “stat()” on “filename”,

split “filename” into “dirname” and “basename”
if the contents of “dirname” are already cached, then return the cached result
else list the contents of “dirname”, cache them, then return the result

This eliminates the problem of headers in subdirs, and it also ensures that we only search directories that are actually used.

lassoan · May 26, 2021, 3:14pm

This is a good idea. It should have better performance, especially if there are only a few files to find and there are lots of include directories.

dgobbi · May 28, 2021, 8:13pm

I’ve created MR !8026 for this, it’s still a work in progress but I wanted to throw the CI machinery at it. I plan put all the new code in a new source file before doing the merge. Mac and Linux are included.

lassoan · May 28, 2021, 10:11pm

This is awesome, thank you!

I’m just curious, why the wrapper has to be implemented in C (why it cannot be C++)?

dgobbi · May 28, 2021, 11:00pm

It started out that way because yacc was C-only. It could be switched to C++, but that’s a lot of work, so why bother? I enjoy programming in C.

Actually, both C and C++ are terrible languages for writing wrappers. More of a text-processing language would be far better. At Queen’s we were using Perl for wrapping… now hardly anyone likes perl, but it’s an awesome language for things like this.

lassoan · May 28, 2021, 11:34pm

This is very interesting. I had good memories about programming in C, but then when I experimented with the wrapper I realized how much I got used to conveniences of C++. Using C was not fun for me at all (I particularly missed STL), but for me it probably would have been an even bigger struggle with Perl