DescriptionWith the goal of performing exascale computing, the importance of I/O management becomes increasingly critical to maintain system performance. While the computing capacities of machines are getting higher, the I/O capabilities of systems do not follow the same trend. To address this issue, the HPC community proposed new solutions such as online in-machine analysis to overcome the limitations of basic post-mortem data analysis where the data have to be stored on the Parallel File System (PFS) first to be processed later.
In this paper, we propose to study different scheduling strategies for in-machine analytics. Our goal is to extract the most important features of analytics that directly determine the efficiency of scheduling strategies. To do so, we propose a memory-constraint modelization for in-machine analysis. It automatically determines hardware resource partitioning and proposes scheduling policies for simulation and analysis. We evaluate our model through simulations and observe that it is critical to base scheduling decisions on the memory needs of each analytics. We also note unexpected behaviors from which we deduce that modeling the in-machine paradigm for HPC applications requires deep understanding of task placement, data movement and hardware partitioning.