<span class="var-sub_title">HPCViz: Monitoring Health Status of High Performance Computing Systems</span> SC18 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

The 2nd Industry/University Joint International Workshop on Data Center Automation, Analytics, and Control (DAAC)


HPCViz: Monitoring Health Status of High Performance Computing Systems

Authors:

Abstract: This paper introduces HPCViz, a visual analytic tool for tracking and monitoring system events through a RESTful interface. The goals of this tool are: 1) to monitor a set of system events from multiple hosts and racks in real time statistics, 2) to support system administrators in alarming and detecting unusual signature-based patterns exhibited by health records of hosts in a complex system, and 3) to help in performing system debugging with a visual layout for both computing resource allocations and health monitoring map that mimics the actual system. A case study was conducted in a Redfish environment with a sample of 10 racks and 467 hosts. The result of the case study shows that the visualization tool offers excellent support for system analysists to profile and observe system behavior and further identify the traces of issues occurred.

Archive Materials


Back to The 2nd Industry/University Joint International Workshop on Data Center Automation, Analytics, and Control (DAAC) Archive Listing

Back to Full Workshop Archive Listing