<span class="var-sub_title">A Fast and Simple Approach to Merge and Merge Sorting Using Wide Vector Instructions</span> SC18 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

IA^3 2018: 8th Workshop on Irregular Applications: Architectures and Algorithms


A Fast and Simple Approach to Merge and Merge Sorting Using Wide Vector Instructions

Authors:

Abstract: Merging and sorting algorithms are the backbone of many modern computer applications. As such, efficient implementations are desired. Recent architectural advancements in CPUs (Central Processing Units), such as wider and more powerful vector instructions, allow for algorithmic improvements. This paper presents a new approach to merge sort using vector instructions. Traditional approaches to vectorized sorting typically utilize a bitonic sorting network (Batcher's Algorithm) which adds significant overhead. Our approach eliminates the overhead from this approach. We start with a branch-avoiding merge algorithm and then use the Merge Path algorithm to split up merging between the different SIMD lanes. Testing demonstrates that the algorithm not only surpasses the SIMD based bitonic counterpart, but that it is over 2.94X faster than a traditional merge, merging over 300M keys per second in one thread and over 16B keys per second in parallel. Our new sort reaches is over 5X faster than quicksort and 2X faster than Intel's IPP library sort, sorting over 5.3M keys per second for a single processor and in parallel over 500M keys per second and a speedup of over 2X from a traditional merge sort.

Archive Materials


Back to IA^3 2018: 8th Workshop on Irregular Applications: Architectures and Algorithms Archive Listing

Back to Full Workshop Archive Listing