<span class="var-sub_title">End-to-End Online Performance Data Capture and Analysis for Scientific Workflows</span> SC18 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

WORKS 2018: 13th Workshop on Workflows in Support of Large-Scale


End-to-End Online Performance Data Capture and Analysis for Scientific Workflows

Authors: George Papadimitriou (University of Southern California)

Abstract: With the increased prevalence of employing workflows for scientific computing and a push toward exascale computing, it has become paramount that we are able to analyze characteristics of scientific applications to better understand the impact on the underlying infrastructure and vice-versa. Such analysis can help drive the design, development, and optimization of these next generation systems and solutions. In this paper, we present the architecture, integration with existing well-established and newly developed tools, to collect online performance statistics of workflow executions from various, heterogeneous sources and publish them in a distributed database (Elasticsearch). Using this architecture, we are able to correlate online workflow performance data with data from the underlying infrastructure, and present them in a useful and intuitive way via an online dashboard. We have validated our approach by executing two classes of real-world workflows, both under normal and anomalous conditions. The first is an I/O-intensive genome analysis workflow; the second, a CPU- and memory-intensive material science workflow. Based on the data collected in Elasticsearch, we are able to demonstrate that we can correctly identify anomalies that we injected. We identify this end-to-end data collection of workflow performance data as an important resource of training data for automated machine learning analysis.

Archive Materials


Back to WORKS 2018: 13th Workshop on Workflows in Support of Large-Scale Archive Listing

Back to Full Workshop Archive Listing