Faster sorting with SIMD CUDA intrinsics (2024)