Hello, recently while working on the Paraview Digital Signal Processing plugin I’ve been wondering about a VTK-m FFT implementation. In fact, for several DSP algorithms, the performance bottleneck is going from time domain to frequency domain using the FFT.
Currently, vtkTableFFT and vtkFFT, the main implementations of FFT in VTK, use internally the KissFFT library and SMPTools, which can run in parallel, but is still CPU-only. Alternative implementations using GPU have emerged over the years, such as VkFFT and HeFFT from the ECP. Still, VTK-m, the dedicated VTK accelerator that can run filters on GPU, does not have its own implementation of the algorithm. Many filters now in VTK and Paraview can use the faster VTK-m parallel implementations, but the FFT is not one of them.
Based on some tests that we’ve made using the VkFFT implementation , we can expect a real speedup from using a GPU algorithm only starting from a large number of points ; unless the full processing pipeline uses filters running on GPU, we need to do costly data transfers from the RAM to GPU memory back and forth.
Implementing the FFT in VTK-m could benefit a number of processing pipelines. This process would take several steps :
- Implementing the algorithm in VTK-m using worklets.
- Benchmarking using the internal framework, to gauge the possible performance improvement using GPU over CPU for different dataset sizes.
- After the next VTK-m release is ported to VTK, wrap the filter in a VTK filter, API compatible with vtkFFT and vtkTableFFT, using zero-copy data model conversion. Finally, activate the override mechanism.
- Expose it in Paraview. The DSP filters would automatically benefit from the performance improvement if the override is active.
Would anyone else benefit from a faster vtkFFT implementation ?