Simplified python wheel for VTK

The vtk wheel from pypi does not work on a machine that has a GPU without a display connected. A common problem is when people use trame with VTK/ParaView for remote rendering and run their server process in a cloud machine that provisions a GPU without any screen. Sometimes, the GPUs themselves do not even have ports to plug in a monitor. In these scenarios, VTK could be more adaptable by falling back to a suitable OpenGL context backend at runtime.

In order for VTK to use the GPU, users are asked to uninstall the vtk wheel and install another “vtk-egl”:

pip install "vtk-egl" --extra-index-url https://wheels.vtk.org

I did some research into why vtk-egl is not distributed in pypi. I think the reason has to do with being unable to bundle libEGL.so along with vtk wheel in pypi because libEGL is part of the linux graphics driver stack and comes with nvidia. I think pypi also doesn’t allow bundling arbitrary shared libraries in the wheel and all libs must be built from source. We could distribute mesa’s libEGL but that’s not great for nvidia gpus.

If I can adapt VTK to not link with libEGL at compile time, there wouldn’t be a need to have an entirely separate wheel. Imagine running a single command pip install vtk, on your laptop or in the cloud and VTK will automatically use X when it can open a display or EGL when a display is not available.

As I started looking deeper into this, the first problem was that the existing OpenGL/Context loader in VTK glew could not handle both EGL and GLX in the same build (issue vtk/vtk#18547)

@meak worked on that issue for a while and he came up with a nice solution. Michael changed VTK to use glad2 instead of glew. glad2 is a neat tool that generates custom OpenGL/Vulkan/WGL/EGL/GLX dynamic loaders. I’ve worked with glad in the past and liked the simplicity of it.

I’ve picked up @meak’s commits from https://gitlab.kitware.com/michael.migliore/vtk/-/commit/cea5e2cd7c938c7770151fbafee2823857f8322d and made few more changes in my use-glad branch that extend the dynamic loading for glx and opengl as well.

Here, I’m emulating a headless machine by running the vtkProbeOpenGLVersion executable inside a docker container with/without an X display or GPU and it seems to do the right thing.

  1. No X display and no GPU

    Expects vtkEGLRenderWindow with Mesa drivers

    # emulate a disconnected display for the docker container.
    $ xhost -
    access control enabled, only authorized clients can connect
    
    # emulate no x and no gpu
    $ docker run --rm -it -v$PWD:/vtk kitware/vtk:ci-fedora39-20240721 /bin/bash
    
    $ ./vtk/in_docker_build/bin/vtkProbeOpenGLVersion-9.3
    2024-07-30 22:24:36.855 (   0.002s) [    7FD571B46C40]vtkRuntimeOpenGLRenderW:85    WARN| Unable to open X display (null)
    Class: vtkEGLRenderWindow succeeded in finding a working OpenGL
    
    OpenGL vendor string:  Mesa
    OpenGL renderer string:  llvmpipe (LLVM 17.0.6, 256 bits)
    OpenGL version string:  4.5 (Compatibility Profile) Mesa 23.3.6
    
  2. X display without GPU

    Expects vtkXOpenGLRenderWindow with Mesa drivers

    # emulate a connected display for the docker container.
    $ xhost +
    access control disabled, clients can connect from any host
    # emulate  no gpu
    $ docker run --rm -it -v /tmp/.X11-unix:/tmp/.X11-unix:Z -e DISPLAY=$DISPLAY -v$PWD:/vtk kitware/vtk:ci-fedora39-20240721 /bin/bash
    $ ./vtk/in_docker_build/bin/vtkProbeOpenGLVersion-9.3
    Class: vtkXOpenGLRenderWindow succeeded in finding a working OpenGL
    
    server glx vendor string:  NVIDIA Corporation
    server glx version string:  1.4
    server glx extensions:  GLX_EXT_visual_info GLX_EXT_visual_rating GLX_EXT_import_context GLX_SGIX_fbconfig GLX_SGIX_pbuffer GLX_SGI_video_sync GLX_SGI_swap_control GLX_EXT_swap_control GLX_EXT_swap_control_tear GLX_EXT_texture_from_pixmap GLX_EXT_buffer_age GLX_ARB_create_context GLX_ARB_create_context_profile GLX_EXT_create_context_es_profile GLX_EXT_create_context_es2_profile GLX_ARB_create_context_no_error GLX_ARB_create_context_robustness GLX_NV_delay_before_swap GLX_EXT_stereo_tree GLX_EXT_libglvnd GLX_ARB_context_flush_control GLX_NV_robustness_video_memory_purge GLX_NV_multigpu_context GLX_ARB_multisample GLX_NV_float_buffer GLX_ARB_fbconfig_float GLX_EXT_framebuffer_sRGB GLX_NV_copy_image GLX_NV_copy_buffer
    client glx vendor string:  Mesa Project and SGI
    client glx version string:  1.4
    glx extensions:  GLX_ARB_context_flush_control GLX_ARB_create_context GLX_ARB_create_context_no_error GLX_ARB_create_context_profile GLX_ARB_create_context_robustness GLX_ARB_fbconfig_float GLX_ARB_framebuffer_sRGB GLX_ARB_get_proc_address GLX_ARB_multisample GLX_EXT_create_context_es2_profile GLX_EXT_create_context_es_profile GLX_EXT_framebuffer_sRGB GLX_EXT_import_context GLX_EXT_texture_from_pixmap GLX_EXT_visual_info GLX_EXT_visual_rating GLX_MESA_query_renderer GLX_SGIX_fbconfig GLX_SGIX_pbuffer
    OpenGL vendor string:  Mesa
    OpenGL renderer string:  llvmpipe (LLVM 17.0.6, 256 bits)
    OpenGL version string:  4.5 (Core Profile) Mesa 23.3.6
    
  3. GPU without X display

    Expects vtkEGLRenderWindow with nvidia driver

    $ xhost -
    access control enabled, only authorized clients can connect
    
    $ docker run --rm -it -v /tmp/.X11-unix:/tmp/.X11-unix:Z -v $PWD:/vtk -e DISPLAY=$DISPLAY --runtime=nvidia --gpus=all -e NVIDIA_DRIVER_CAPABILITIES=graphics,display,compute kitware/vtk:ci-fedora39-20240721 /bin/bash
    $ ./vtk/in_docker_build/bin/vtkProbeOpenGLVersion-9.3
    Authorization required, but no authorization protocol specified
    2024-07-30 vtkRuntimeOpenGLRenderWindowFactory.cxx:85 WARN| Unable to open X display :1
    Authorization required, but no authorization protocol specified
    Authorization required, but no authorization protocol specified
    Authorization required, but no authorization protocol specified
    Class: vtkEGLRenderWindow succeeded in finding a working OpenGL
    
    OpenGL vendor string:  NVIDIA Corporation
    OpenGL renderer string:  NVIDIA RTX A2000 Laptop GPU/PCIe/SSE2
    OpenGL version string:  4.6.0 NVIDIA 555.42.02
    
  4. Finally, with a GPU and X display (this is the most common usecase on a PC)

    Expects vtkXOpenGLRenderWindow with nvidia driver

    $ xhost +
    access control disabled, clients can connect from any host
    
    $ docker run --rm -it -v /tmp/.X11-unix:/tmp/.X11-unix:Z -v $PWD:/vtk -e DISPLAY=$DISPLAY --runtime=nvidia --gpus=all -e NVIDIA_DRIVER_CAPABILITIES=graphics,display,compute kitware/vtk:ci-fedora39-20240721 /bin/bash
    $ ./vtk/in_docker_build/bin/vtkProbeOpenGLVersion-9.3
    Class: vtkXOpenGLRenderWindow succeeded in finding a working OpenGL
    
    server glx vendor string:  NVIDIA Corporation
    server glx version string:  1.4
    server glx extensions:  GLX_EXT_visual_info GLX_EXT_visual_rating GLX_EXT_import_context GLX_SGIX_fbconfig GLX_SGIX_pbuffer GLX_SGI_video_sync GLX_SGI_swap_control GLX_EXT_swap_control GLX_EXT_swap_control_tear GLX_EXT_texture_from_pixmap GLX_EXT_buffer_age GLX_ARB_create_context GLX_ARB_create_context_profile GLX_EXT_create_context_es_profile GLX_EXT_create_context_es2_profile GLX_ARB_create_context_no_error GLX_ARB_create_context_robustness GLX_NV_delay_before_swap GLX_EXT_stereo_tree GLX_EXT_libglvnd GLX_ARB_context_flush_control GLX_NV_robustness_video_memory_purge GLX_NV_multigpu_context GLX_ARB_multisample GLX_NV_float_buffer GLX_ARB_fbconfig_float GLX_EXT_framebuffer_sRGB GLX_NV_copy_image GLX_NV_copy_buffer
    client glx vendor string:  NVIDIA Corporation
    client glx version string:  1.4
    glx extensions:  GLX_ARB_get_proc_address GLX_ARB_multisample GLX_EXT_visual_info GLX_EXT_visual_rating GLX_EXT_import_context GLX_SGI_video_sync GLX_SGIX_fbconfig GLX_SGIX_pbuffer GLX_SGI_swap_control GLX_EXT_swap_control GLX_EXT_swap_control_tear GLX_EXT_buffer_age GLX_ARB_create_context GLX_ARB_create_context_profile GLX_NV_float_buffer GLX_ARB_fbconfig_float GLX_EXT_texture_from_pixmap GLX_EXT_framebuffer_sRGB GLX_NV_copy_image GLX_NV_copy_buffer GLX_EXT_create_context_es_profile GLX_EXT_create_context_es2_profile GLX_ARB_create_context_no_error GLX_ARB_create_context_robustness GLX_NV_delay_before_swap GLX_EXT_stereo_tree GLX_ARB_context_flush_control GLX_NV_robustness_video_memory_purge GLX_NV_multigpu_context
    OpenGL vendor string:  NVIDIA Corporation
    OpenGL renderer string:  NVIDIA RTX A2000 Laptop GPU/PCIe/SSE2
    OpenGL version string:  4.5.0 NVIDIA 555.42.02
    

Many unit tests pass with EGL, so I’m close to creating an MR.

  1. Is it a good idea to remove the VTK_OPENGL_HAS_EGL build setting? It does not make sense anymore because VTK will no longer rely on find_package(OpenGL) to discover EGL at compile time. Can we remove it and act as if EGL always exists?
  2. Can the Utility target VTK::opengl also be removed? All it’s targets are now dynamically discovered, except OSMesa. that should be trivial to implement.
9 Likes

Having a different wheel for EGL also caused awkward packaging scripts for downstream packages that want to rely on the vtk wheel from pypi.

Some conda environments may not be able to pull in a wheel from a site specified in --extra-index-url. In those cases, the only solution was to build VTK from source.

Awsome work. A couple days ago, we had to buy a RTX for our windows server.

1 Like

I’m glad someone is taking it over :face_with_hand_over_mouth:
Thanks @jaswantp!

1 Like

Just want to say great work. All the scenarios you enumerate have their use cases. I’ve used three of them, and having it just work ™ will be awesome.

1 Like

Nice! Which is the one you haven’t used? Just curious

I recently stumbled upon the same problem, see Core dumped on examples VTK Simple cone Remote Rendering · Issue #570 · Kitware/trame · GitHub

Additionally, every now and again vtk would instantiate an X instead of the EGL render window and it all crashed.

I had to pip remove/install vtk and vtk-egl a few times before it was somehow stable, but this may be a problem with the environment.

However it would be better if at the system would exit with an error rather than crash, but it seems you nailed it! Thanks.

1 Like

What about pure software rendering mode through OSMesa? Would that also work? That is an important use case (and why there is also vtk-osmesa wheels available).

Great work!

1 Like

What about pure software rendering mode through OSMesa? Would that also work?

Yes, OSMesa will be tried when VTK can’t find EGL. It can be forced with an environment variable too.

Afaik, recent release of libegl support CPU software rendering, making osmesa a not so important usecase imo.

That is very interesting, thanks for letting me know.

1 Like

Correct, osmesa is not really needed.