HPC PowerStack: a community-wide open collaboration for enabling system-wide power efficiency
Authors: Siddhartha Jana (Intel Corporation)
Abstract: One of the challenges to Open Supercomputing in the exascale era is the design of software solutions that drive Power Management across the system stack. In order to tackle the power challenges, the HPC community has designed various open source and vendor-specific solutions to meet power management goals at various levels of granularity - from system level to node-level. These components are built under certain assumptions of the system behavior and remain completely agnostic of each other’s techniques. For example, an open source workload manager attempting to reduce the system-level power consumption may remain completely oblivious of the power demands by the application-level runtime and node-level hardware components. Attempting to integrate them into a unified system stack often lead to them overstepping on each other’s solutions which, in turn leads to unreliable system performance. To avoid this, system integrators end up designing their stack with tightly-coupled cherry-picked vendor-specific solutions. This lack of coordination between the vendors and the open solution community leads to underutilization of system Watts and FLOPS. As a result, there is an urgent need for the HPC community to (A) identify the key software actors needed in a system power stack: job-schedulers, application-level runtime, hardware knobs (B) arrive at a consensus on the roles and responsibilities of these actors, (C) design communication protocols for bidirectional control and feedback signals among the actors to enable scalable closed-loop coordination at different granularities, and (E) study and combine existing standalone engineering and development prototypes and build a community that actively participates in open development and engineering efforts.
This realization led to the formation of the PowerStack Community, in 2016. The participants of this community include global-wide members from academia, government labs, and vendors working on different layers of the system software stack. This talk is intended to spread awareness among the workshop attendees and solicit participation. The hope is to facilitate integrating open power management solutions into future PowerStack-compatible systems.
Back to 4th Workshop for Open Source Supercomputing (OpenSuCo) Archive Listing
Back to Full Workshop Archive Listing