Explore chapters and articles related to this topic
Design of a modern fast Fourier transform and cache effective bit-reversal algorithm
Published in International Journal of Parallel, Emergent and Distributed Systems, 2023
Adam Simek, Ivan Šimeček
Benchmarks were designed with the help of FFTW benchmark methodology [16] and all FFT compared have very similar usage conditions, with the planning part not counted and then compared running times of FFT parts and converted time results to MFLOPS. Tests were performed on a unit with Intel Xeon Gold 6130 Processor, which provides AVX2 as well as AVX-512. All libraries tested had at least AVX2 enabled and everything was compiled with GCC compiler with -O3 optimizations and vectorization enabled, all tests are measured in double precision, which means AVX uses vector size 4 and AVX-512 vector size 8. OpenMP tests were computed on Intel Xeon Scalable Gold 6146 providing 24 () cores in the dual sockets, with AVX2 codes and -O3 optimizations.