HPCViz: Monitoring Health Status of High Performance Computing Systems
Authors:
Abstract: This paper introduces HPCViz, a visual analytic tool for tracking and monitoring system events through a RESTful interface. The goals of this tool are: 1) to monitor a set of system events from multiple hosts and racks in real time statistics, 2) to support system administrators in alarming and detecting unusual signature-based patterns exhibited by health records of hosts in a complex system, and 3) to help in performing system debugging with a visual layout for both computing resource allocations and health monitoring map that mimics the actual system. A case study was conducted in a Redfish environment with a sample of 10 racks and 467 hosts. The result of the case study shows that the visualization tool offers excellent support for system analysists to profile and observe system behavior and further identify the traces of issues occurred.
Archive Materials
Back to The 2nd Industry/University Joint International Workshop on Data Center Automation, Analytics, and Control (DAAC) Archive Listing
Back to Full Workshop Archive Listing