Using a Robust Metadata Management System to Accelerate Scientific Discovery at Extreme Scales
Authors: Margaret Lawson (University of Illinois, Sandia National Laboratories)
Abstract: Large-scale scientific simulations are an important tool for scientific discovery. In recent years, there has been a rapid growth in the amount of data output by these simulations. Extended runs of simulations such as XGC edge plasma fusion can easily generate datasets in the terabyte to petabyte range. With such large datasets, it is no longer feasible for scientists to load entire simulation outputs in search of features of interest. Scientists need an efficient, low-memory usage way of identifying which simulations produce a phenomenon, when and where the phenomenon appears, and how the phenomenon changes over time. However, current I/O systems such as HDF, NetCDF, and ADIOS do not provide these metadata capabilities. While some alternative tools have been developed that are optimized for a single type of analysis (global, spatial or temporal), no system provides an efficient way to perform all of these types of analysis. To fill this need, I have created EMPRESS, an RDBMS-based metadata management system for extreme scale scientific simulations. EMPRESS offers users the ability to efficiently tag and search features of interest without having to read in the associated datasets. Users can then use this metadata to perform spatial, temporal or global analysis and make discoveries. EMPRESS has been tested using several of Sandia's capacity clusters. Testing has primarily involved 1000, 2000, and 4000 cores, but several 8000 core tests were performed as well. Testing has proved that EMPRESS offers vastly better performance on these vital metadata functions than HDF5.
Back to Women in HPC: Diversifying the HPC Community Archive Listing
Back to Full Workshop Archive Listing