Would 3x3 matrix multiplication be faster on vtk using Laderman's algorithm?

will.schroeder · November 21, 2022, 3:15pm

+1 to Andras’s comments.

My experience suggests that by 1) redesigning algorithms, 2) avoiding excessing new/delete, 3) designing efficient API’s, and/or 4) threading routinely produces 5-100x performance gains. Low-level optimization typically produces very modest gains (which may be worth it for important workflows as Andras suggests), at the expense of code complexity. And when we are talking millions of LOC code complexity is a many-headed medusa that I’d rather not face