Slow ec2 comparing to my mac m2 machine

Hello,

I found that my mac m2 is much faster performing the following code, comparing to bigger machines in the cloud (c7i.8xlarge, g5.2xlarge), and I suspect this is something to do with connecting to the screen.

Class MyRenderer:
def __init__(self, mesh: trimesh.Trimesh):
    self.vtkmesh = create_vtk_mesh(mesh.vertices, mesh.faces) # This returns vtkPolyData mesh
    self.renderer = vtk.vtkRenderer()
    self.render_window = vtk.vtkRenderWindow()
    self.render_window.AddRenderer(self.renderer)
    self.render_window.SetOffScreenRendering(False)
    self.render_window.SetSize(128, 128)

    mapper = vtk.vtkPolyDataMapper()
    mapper.SetInputData(self.vtkmesh)
    actor = vtk.vtkActor()
    actor.SetMapper(mapper)
    self.renderer.AddActor(actor)
    self.camera = vtk.vtkCamera()
    self.camera.ParallelProjectionOn()
    self.camera.SetParallelScale(75)
    self.camera.SetClippingRange(1, 400)
    self.renderer.SetActiveCamera(self.camera)
    self.render_window.Render()

def render_mesh(self, camera_position: np.ndarray, camera_focal_point: np.ndarray, pts_3d=None):
    self.camera.SetPosition(*camera_position)
    self.camera.SetFocalPoint(*camera_focal_point)
    view_matrix = self.camera.GetViewTransformMatrix()
    projection_matrix = self.camera.GetProjectionTransformMatrix(1, -1, 1)
    view_matrix_np = np.array([view_matrix.GetElement(i, j) for i in range(4) for j in range(4)]).reshape((4, 4))
    projection_matrix_np = np.array([projection_matrix.GetElement(i, j) for i in range(4) for j in range(4)]).reshape((4, 4))
    rotation_det = np.linalg.det(view_matrix_np[:3, :3])
    imaging_matrix = projection_matrix_np @ view_matrix_np

    sensor_normal = camera_focal_point - camera_position
    sensor_normal = sensor_normal / np.linalg.norm(sensor_normal)
    sensor_plane = sensor_normal[0], sensor_normal[1], sensor_normal[2], -np.dot(camera_position, sensor_normal)

    self.renderer.SetActiveCamera(self.camera)
    self.render_window.Render()

    window_to_image_filter = vtk.vtkWindowToImageFilter()
    window_to_image_filter.SetInput(self.render_window)
    
    # Bencmark:
    # On Apple MAC M2 - 0.001
    # AWS EC2 c7i.8xlarge Amazon linux (cpu) - 0.130
    # AWS EC2 g5.2xlarge Ubuntu (GPU) - 0.134
    with timing('this is slow'):
        window_to_image_filter.Update()

Calling window_to_image_filter.Update() on my laptop takes approximately 0.001 seconds, while doing the same on the machine in the cloud is 100 slower.
I’m using xvfb, and also try to run it with vtk-osmesa instead of vtk-9.4.0.

How can I make it as fast as my laptop? or at least 10x faster?

The reason why it is slow, is because OSMesa get picked up for doing the rendering.

If you have graphic hardware (g5.*), you should be able to use EGL which is embedded in 9.4 but would require you to have the nvidia drivers properly installed on that machine.

We have had great performance with Nice DCV for Amazon Linux 2.
We use the g4 series, which are GPU(graphics) enabled. g4dn=nvidia g4ad=AMD