<span class="var-sub_title">Extreme Scale De Novo Metagenome Assembly</span> SC18 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Extreme Scale De Novo Metagenome Assembly


Authors: Evangelos Georganas (Intel Corporation), Rob Egan (Lawrence Berkeley National Laboratory), Steven Hofmeyr (Lawrence Berkeley National Laboratory), Eugene Goltsman (Lawrence Berkeley National Laboratory), Bill Arndt (Lawrence Berkeley National Laboratory), Andrew Tritt (Lawrence Berkeley National Laboratory), Aydin Buluc (Lawrence Berkeley National Laboratory), Leonid Oliker (Lawrence Berkeley National Laboratory), Katherine Yelick (Lawrence Berkeley National Laboratory)

Abstract: Metagenome assembly is the process of transforming a set of short, overlapping, and potentially erroneous DNA segments from environmental samples into the accurate representation of the underlying microbiomes's genomes. State-of-the-art tools require large shared memory machines and cannot handle contemporary metagenome datasets that exceed terabytes in size. In this paper, we introduce the metaHipMer pipeline, a high-quality and high-performance metagenome assembler that employs an iterative de Bruijn graph approach. MetaHipMer leverages a specialized scaffolding algorithm that produces long scaffolds and accommodates the idiosyncrasies of metagenomes. MetaHipMer is end-to-end parallelized using the Unified Parallel C language and therefore can run seamlessly on shared and distributed-memory systems. Experimental results show that metaHipMer matches or outperforms the state-of-the-art tools in terms of accuracy. Moreover, metaHipMer scales efficiently to large concurrencies and is able to assemble previously intractable grand challenge metagenomes.




Back to Technical Papers Archive Listing