<span class="var-sub_title">Using Thrill to Process Scientific Data on HPC</span> SC18 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Using Thrill to Process Scientific Data on HPC

Authors: Mariia Karabin (Clemson University, Los Alamos National Laboratory), Xinyu Chen (University of New Mexico), Supreeth Suresh (University of Wyoming), Ivo Jimenez (University of California, Santa Cruz), Li-Ta Lo (Los Alamos National Laboratory), Pascal Grosset (Los Alamos National Laboratory)

Abstract: With ongoing improvement of computational power and memory capacity, the volume of scientific data keeps growing. To gain insights from vast amounts of data, scientists are starting to look at Big Data processing and analytics tools such as Apache Spark. In this poster, we explore Thrill, a framework for big data computation on HPC clusters that provides an interface similar to systems like Apache Spark but delivers higher performance since it is built on C++ and MPI. Using Thrill, we implemented several analytics operations to post-process and analyze data from plasma physics and molecular dynamics simulations. Those operations were implemented with less programming effort than hand-crafted data processing programs would require and obtained preliminary results which were verified by scientists at LANL.

Best Poster Finalist (BP): no

Poster: pdf
Poster summary: PDF

Back to Poster Archive Listing