<span class="var-sub_title">Scheduling for In-machine Analytics: Data Size Is Important</span> SC18 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

ISAV 2018: In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization


Scheduling for In-machine Analytics: Data Size Is Important

Abstract: With the goal of performing exascale computing, the importance of I/O management becomes increasingly critical to maintain system performance. While the computing capacities of machines are getting higher, the I/O capabilities of systems do not follow the same trend. To address this issue, the HPC community proposed new solutions such as online in-machine analysis to overcome the limitations of basic post-mortem data analysis where the data have to be stored on the Parallel File System (PFS) first to be processed later.

In this paper, we propose to study different scheduling strategies for in-machine analytics. Our goal is to extract the most important features of analytics that directly determine the efficiency of scheduling strategies. To do so, we propose a memory-constraint modelization for in-machine analysis. It automatically determines hardware resource partitioning and proposes scheduling policies for simulation and analysis. We evaluate our model through simulations and observe that it is critical to base scheduling decisions on the memory needs of each analytics. We also note unexpected behaviors from which we deduce that modeling the in-machine paradigm for HPC applications requires deep understanding of task placement, data movement and hardware partitioning.


Archive Materials


Back to ISAV 2018: In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization Archive Listing

Back to Full Workshop Archive Listing