<span class="var-sub_title">Parallel and Scalable Combinatorial String and Graph Algorithms on Distributed Memory Systems</span> SC18 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Parallel and Scalable Combinatorial String and Graph Algorithms on Distributed Memory Systems


Student: Patrick Flick (Georgia Institute of Technology)
Advisor: Srinivas Aluru (Georgia Institute of Technology)

Abstract: Methods for processing and analyzing DNA and genomic data are built upon combinatorial graph and string algorithms. The advent of high-throughput DNA sequencing is enabling the generation of billions of reads per experiment. Classical and sequential algorithms can no longer deal with these growing data sizes, which for the last 10 years have greatly out-paced advances in processor speeds. To process and analyze state-of-the-art genomic data sets require the design of scalable and efficient parallel algorithms and the use of large computing clusters. Here, we present our distributed-memory parallel algorithms for indexing large genomic datasets, including algorithms for construction of suffix- and LCP arrays, solving the All-Nearest-Smaller-Values problem and its application to the construction of suffix trees. Our parallel algorithms exhibit superior runtime complexity and practical performance compared to the state-of-the-art. Furthermore, we present distributed-memory algorithms for clustering de-Bruijn graphs and its application to solving a grand challenge metagenomic dataset.

Summary: pdf
Thesis Canvas: pdf



Back to Doctoral Showcase Archive Listing