Authors: Ghaleb Abdulla (Lawrence Livermore National Laboratory), Torsten Wilde (Hewlett Packard Enterprise), Ryousei Takano (National Institute of Advanced Industrial Science and Technology (AIST)), David Martinez (Sandia National Laboratories), Carlo Cavazzoni (CINECA), Andrea Bartolini (University of Bologna), Devesh Tiwari (Northeastern University)
Abstract: Several leading edge supercomputing centers across the globe have been working on developing or acquiring data acquisition and management systems to support energy efficiency studies and reporting, to understand rare and important events, and to help energy providers with energy scheduling and management. This BoF presents experiences from HPC centers regarding data source integration and analytics and seeks those interested in hearing about or sharing experiences with dynamic power and energy management. It is important to get the community together to share solutions and educate each other about the challenges, success stories, and use cases.
Long Description: **Motivation: There are many use cases for data source integration and analytics. For example, supercomputer systems have increasingly rapid, unpredictable, and large power fluctuations. In addition, electricity service providers require reporting of significant power fluctuations ahead of time and may request supercomputing centers to change their timing and/or magnitude of demand to help address electricity-supply constraints. To adapt to this new landscape, HPC centers are putting in place systems for data collection and management and may employ data analytics to dynamically and in real-time control their electricity demand. Also, the United States Department of Energy is concerned about the value add from a Data Center Information Management (DCIM) solution that can tackle both cybersecurity and asset management. As a result, they started a pilot with Nlyte to help centers that does not have monitoring capabilities for their data centers. It will be important to hear from both sides and share experiences and discuss future plans. Looking forward, the efforts to build power aware schedulers has been interacting and benefiting from the center and cluster level energy studies. There is a great need to integrate and understand the power data at the different levels of the data center.
**Audience interaction: For a two-way interaction between the BOF attendees and the survey topics, we plan on inviting the representatives of the organizations that are employing tools enabling better data integration between HPC system and facility. The purpose is to provide the audience an opportunity to learn about the different mechanisms being funded and adopted by different sites across the globe. This will also give the site representatives an opportunity for gauging the level of interest and acceptance of their proposed techniques by the public. The discussions will help with unifying the vision for the data collection, management, and applications path. There is a need to understand the requirements of the different sites, how much of these requirements are shared, and wether a one solution fits all or not.
**Pre-event tasks: The Energy Efficient HPC Working Group (EE HPC WG) - https://eehpcwg.llnl.gov/ - has a Team focused on data source integration and analytics. This Team has been gathering information from sites that have implemented aggregated data collection for operational management (including energy management) in a production environment on at least one large-scale system (Top500 sized system) with a scope that extends from the site down to the CPU. Initial results from this questionnaire will be presented at the BoF.
**Post-event tasks: After the BoF, a report summarizing the BoF discussion will be prepared by the EE HPC WG Data Source Integration and Analytics Team and shared with the community via the EE HPC WG website.
Back to Birds of a Feather Archive Listing