Authors:
Abstract: Metagenome assembly is the process of transforming a set of short, overlapping, and potentially erroneous DNA segments from environmental samples into the accurate representation of the underlying microbiomes's genomes. State-of-the-art tools require large shared memory machines and cannot handle contemporary metagenome datasets that exceed terabytes in size. In this paper, we introduce the metaHipMer pipeline, a high-quality and high-performance metagenome assembler that employs an iterative de Bruijn graph approach. MetaHipMer leverages a specialized scaffolding algorithm that produces long scaffolds and accommodates the idiosyncrasies of metagenomes. MetaHipMer is end-to-end parallelized using the Unified Parallel C language and therefore can run seamlessly on shared and distributed-memory systems. Experimental results show that metaHipMer matches or outperforms the state-of-the-art tools in terms of accuracy. Moreover, metaHipMer scales efficiently to large concurrencies and is able to assemble previously intractable grand challenge metagenomes.
Back to Technical Papers Archive Listing